High-Availability Kubernetes on Hetzner with Talos 1.11

Stop Overpaying for Cloud: High-Availability Kubernetes on Hetzner with Talos 1.11

If you are running production workloads like Mastodon, Odoo, or a fleet of WordPress sites, you might think you need to stick with the major hyperscalers. However, you don’t need to burn money on AWS or Google Cloud just to get reliability. In fact, you can build a massive, enterprise-grade Kubernetes on Hetzner Cloud for a fraction of the cost.

Specifically, in this guide, we will deploy two different architectures using Talos Linux 1.11 and Cilium (managed via OpenTofu). Whether you need a beast of a cluster for databases or a cost-effective setup for static sites, we have you covered.

Grab your €20 Free Credit on Hetzner Cloud here to follow along.

The “Business Powerhouse”: 3x CX53 Converged

Best for: Production Apps, Mastodon Instances, Odoo ERP, High-Traffic WordPress Networks.

First and foremost, this is the serious setup. We are using the CX53 nodes, which are absolute monsters for the price. By running a “Converged” setup (where every node is both a control plane and a worker), we get high availability (HA). Consequently, we can use every ounce of RAM for our applications without wasting resources on idle management nodes.

The Hardware Specs

  • Nodes: 3x CX53 (16 vCPU, 32GB RAM, 320GB NVMe)
  • Networking: 1x Load Balancer (LB11) for the API and 1x Failover IP for the Gateway.
  • Storage: Hetzner Object Storage (S3 Compatible) for media assets.

Monthly Cost Breakdown

  • 3x CX53 Nodes (@ €17.49/mo): €52.47
  • 1x Load Balancer (LB11): €5.39
  • 1x Floating IP (IPv4) for Gateway: €3.60
  • 1x Object Storage (1TB included): ~€5.00
  • TOTAL: ~€66.46 / month

As a result, for roughly €66 a month, you are getting 48 vCPUs and 96GB of RAM. In comparison, if you tried getting that on AWS, you would be paying over €400.

Real-World Use Case: Mastodon & Odoo

With this much power, you can easily host a large Mastodon instance. However, the trick is to configure Mastodon to use Hetzner’s S3-compatible Object Storage for all media files. This ensures your local NVMe usage stays low. Simultaneously, you could run an Odoo ERP system with a replicated PostgreSQL cluster and still have room for 10-20 heavy WordPress sites.

Sign up now to get your €20 credit and build this beast.


The “Indie Hacker”: 1 CP + 3 Workers (CX23)

Best for: Single Page Apps (SPA), Static Sites, Dev Environments, Low-Traffic APIs.

Alternatively, if you don’t need HA for the control plane and just want a cheap place to host React/Vue apps or static sites, this tiered setup offers unbeatable value. Here, we use one node to manage the cluster while three nodes handle the actual work.

The Hardware Specs

  • Control Plane: 1x CX23 (2 vCPU, 4GB RAM)
  • Workers: 3x CX23 (2 vCPU, 4GB RAM)
  • Networking: Direct ingress (No Load Balancer needed for simple setups; instead, just point DNS to a worker or use a Floating IP).

Monthly Cost Breakdown

  • 4x CX23 Nodes (@ €3.49/mo): €13.96
  • 1x Floating IP (IPv4): €3.60
  • TOTAL: ~€17.56 / month

Therefore, for less than the price of Netflix, you have a 4-node Kubernetes cluster capable of hosting hundreds of static sites or SPAs.

Start your Indie Hacker journey with €20 free credit.


Infrastructure as Code: OpenTofu & Cilium

For deployment, we use OpenTofu (the open-source fork of Terraform) to provision the infrastructure. Additionally, we will enable the Gateway API feature in Cilium 1.16+ to handle traffic routing efficiently.

Why use the “Debian 12” Image? Hetzner Cloud does not have a native API image for Talos Linux.
Therefore, we use a clever “Bootstrap” strategy: we provision a standard Debian 12 server and use a script to wipe the disk and install Talos automatically on the very first boot.

main.tf (Converged CX53 Example)

Setting up Cilium & Gateway API

Once Talos 1.11 is bootstrapped, you should install Cilium with Gateway API enabled. This step effectively replaces the legacy Ingress Controller.

Finally, you then configure the Hetzner Cloud Controller Manager to bind your Floating IP to the Cilium Gateway LoadBalancer service. This ensures that if one of your massive CX53 nodes reboots, traffic instantly shifts to another node without downtime.


Storage Strategy: Keeping Data Safe

You might be wondering: “If a node dies and my database moves to another server, what happens to my data?”

If you rely on the local NVMe disk, that data is gone. However, Hetzner has a native solution called Hetzner Cloud Volumes that we automate using the Container Storage Interface (CSI).

💰 Cost Note: Cloud Volumes (Block Storage) cost extra, but they are cheap.
You pay roughly €0.044 per GB/month. A 10GB volume for your database will only cost you about €0.44/month.

1. Block Storage (Databases & PVCs)

For applications like PostgreSQL, MySQL, or WordPress uploads, we use the Hetzner CSI Driver. This allows Kubernetes to provision persistent volumes that live outside your servers.

How it works: If Node A fails, Kubernetes sees that your database pod is down. It automatically detaches the storage volume from Node A and reattaches it to Node B before starting the pod there. Your data survives the crash intact.

2. Object Storage (Media & Backups)

For “unstructured” data like Mastodon media files, user avatars, or Nextcloud backups, do not use Block Storage. It is expensive and hard to resize.

Instead, use Hetzner Object Storage (S3 Compatible). It is dirt cheap (~€5/TB), infinitely scalable, and accessible from any node instantly without waiting for volumes to mount/unmount.

Summary: Where to put data?

  • Postgres/MySQL DB: Hetzner Cloud Volume (via CSI).
  • Redis/Cache: Local NVMe (Ephemeral).
  • User Uploads/Media: Hetzner Object Storage (S3).

Disaster Recovery: What Happens When a Server Dies?

Hardware failures happen. The main difference between our two setups is how much sleep you lose when they do.

Scenario A: The “Indie Hacker” (1 CP + 3 Workers)

If a Worker Node fails: No panic. Kubernetes detects the failure (usually within 40-60 seconds), marks the node as

, and automatically reschedules your pods to the remaining 2 workers. Your site stays online.

If the Control Plane fails: You have a problem. The Kubernetes API goes down. Your existing apps will keep running, but you cannot deploy updates, change configurations, or fix broken pods until you restore the Control Plane from a backup.

Scenario B: The “Business Powerhouse” (3 Converged)

If ANY Node fails: It is a non-event. Because we run 3 Control Plane nodes, the cluster maintains “quorum” (2 out of 3 votes). The Hetzner Load Balancer instantly detects the dead node and stops sending API traffic to it. Your apps get rescheduled to the surviving 2 nodes automatically. You likely won’t even notice until you check your alerts.

How to Fix a Broken Node (The “GitOps” Way)

Since we are using OpenTofu (Terraform), we don’t fix servers; we replace them. If

dies, you simply tell OpenTofu to destroy and recreate it. Talos will automatically boot on the fresh server and rejoin the cluster.

This “Cattle, not Pets” approach is why we love this stack. You never SSH in to fix a broken driver. You just nuke the node and let automation build a new one.

Conclusion

To summarize, Hetzner Cloud combined with Talos Linux is a cheat code for infrastructure. You get the performance of bare metal with the flexibility of the cloud, and all at prices that make the hyperscalers look ridiculous.

Ready to deploy? Then don’t forget to claim your startup credits below.


Get €20 Cloud Credits & Start Building

(Valid for all Hetzner Cloud products)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.