Building a dedicated Kubernetes cluster on Hetzner

Seandon Mooy
Dec 3, 2020

Kubernetes works great anywhere (raspberry pis are our favorite), and can turn any pile of hardware into a production-ready cluster. Today we're going to talk about how to build a Kubernetes cluster using Hetzner Dedicated servers. Hetzner has excellent pricing - combined with Kubernetes we can also create a reliable system out of potentially less reliable servers - cheap, reliable, and fast - is it possible?

We'll be building a vanilla cluster using kubespray for provisioning, but this tutorial will mostly focus on the Hetzner specific pieces - if you're looking for a more generic kubespray tutorial, checkout RedHat's tutorial or the official docs.

For our cluster's workers we'll be using three of Hetzner's AX51 servers, and for the master we'll be using one AX41 server. All together this comes to about $315/month. A rough calcuation suggests that (although not entirely equal in many many ways), this is slightly less than an order of magnitude cheaper than equivalent AWS compute, memory and NVMe. Another way to think about it is that you could have a dedicated Hetzner server with KubeSail enterprise support for about the same price as an unmanaged AWS cluster!

In total, this give us 24CPUs, 192GB of RAM and 7TB of NVMe SSDs. Not too shabby by far!

For this tutorial, we'll be using (and we suggest) Ubuntu 20.04, however, most steps should work with any modern distribution. Note that we'll assume you can setup an SSH key and access the systems.

Firewall

First, create a new firewall template for our nodes. We'll make it the default for all servers. Remember the Kubernetes networking system requires an open, flat network between our nodes.

Hetzner Firewall

Some notes:

  • Rule #2: Only use this if you plan on exposing high-port services! You may not need this, and it can possibly lead to security concerns, so it should be disabled unless you know you need it!

  • Rule #4: Hetzner firewall works both directions - so you'll need to enable outbound DNS!

Name your servers sanely

Typically in Kubernetes, our servers are cattle, not pets. However, in a dedicated world, servers don't come and go willy-nilly - and thus it's safe and helpful to give them reasonable names. You can do this in the Hetzner console:

Hetzner Naming

It's important to keep the mapping of address and names handy, we'll create an Ansible inventory with it shortly!

Setup networking

Hetzner dedicated console contains a product called a vSwitch, a virtual networking device for your hardware. Create a new vSwitch with VLAN ID 4000.

Make sure to add each of your servers (workers and masters) to the vSwitch. This will form our internal network. On all of the systems, we'll want to create a network interface which uses our new vSwitch, for example our 2nd worker might look like this:

> hostnamectl set-hostname worker-2
# edit /etc/netplan/01-netcfg.yaml ...
# Add a VLAN section:
#
# network:
#   version: 2
#   renderer: networkd
#   vlans:
#     enp35s0.4000:
#       id: 4000
#       link: enp35s0
#       mtu: 1400
#       addresses:
#         - 10.1.0.102/24 # Note that worker 2 is 102, worker 3 would be 103, etc...
#      routes:
#        - to: 0.0.0.0/0
#          via: 10.1.0.1
#          table: 1
#          on-link: true
#
> netplan apply

After this step, each system should be able to ping each other (ie: ping 10.1.0.101...)

Load Balancer

You'll want a load balancer for ingress - luckily Hetzner Cloud Load Balancers now support dedicated servers! Head over to the Hetzner cloud console and create a new load balancer. You can add dedicated servers under the "targets" section:

We'll target "444" here, as that's where we'll expose our Ingress controller system. Note that "443" is used internally by kubespray, and you should not bind HostPort services on port 443. Our load balancer itself can safely listen on port 443.

System setup

Ideally, we'll provision Kubernetes via Ansible, and then everything we need to deal with via Kubernetes. However, there may be some reason you'll want to do some manual setup on the hosts. At the very least, we can ensure some basic tools that Ansible and sysadmins might need are installed:

> apt-get update -yqq
> apt-get upgrade -yqq
# Some of our favorites:
> apt-get install -yqq vim gpg gnupg python htop iftop iotop net-tools fail2ban

Ansible setup

  1. Install Ansible (this can be a non-trivial task, but I recommend starting by trying pip3 install ansible).

  2. Next, clone kubespray into a directory, and create a directory inside kubespray/inventory/ (we'll call ours mkdir -p kubespray/inventory/hetzner for now).

  3. Install mitogen, which dramatically speeds up Ansible execution:

> ansible-playbook kubespray/mitogen.yml
  1. Create an inventory file, kubespray/inventory/hetzner/inventory.ini, and fill out the details:
[all]
master-1 ansible_host=MASTER_PUBLIC_IP ip=10.1.0.1 etcd_member_name=etcd1
worker-1 ansible_host=WORKER1_PUBLIC_IP ip=10.1.0.101
worker-2 ansible_host=WORKER2_PUBLIC_IP ip=10.1.0.102
worker-3 ansible_host=WORKER3_PUBLIC_IP ip=10.1.0.103

[kube-master]
master-1

[etcd]
master-1

[etcd-backup]
master-1

[kube-node]
worker-1
worker-2
worker-3
worker-4

[calico-rr]

[k8s-cluster:children]
kube-master
kube-node
calico-rr

Next, let's define some base variables for kubespray in kubespray/inventory/hetzner/group_vars/all.yml

---
kube_version: v1.19.3

# Important to tell Kubernetes that we're not hosted on any cloud provider, we'll deal with that ourselves.
cloud_provider: external
primaryClusterIP: "{{ kube_service_addresses|ipaddr('net')|ipaddr(3)|ipaddr('address') }}"

# We'll use Calico to provision a private pod network on our VLAN.
# Note that from now on, if you see any address like `10.1.xxx.xxx` it is a "node network" address
# Any address like `10.2.xxx.xxx` is a "pod network" address powered by Calico
kube_service_addresses: 10.2.0.0/18
kube_pods_subnet: 10.2.64.0/18
kube_network_plugin: calico
kube_apiserver_node_port_range: 30000-60000
kubelet_preferred_address_types: InternalIP,ExternalIP,Hostname
loadbalancer_apiserver_port: 6443 # We'll put the Kubernetes API on port 6443, to keep 443 free on the host.
calico_mtu: 1350 # See https://docs.hetzner.com/robot/dedicated-server/network/vswitch/ (1400 - packet overhead)
calico_felix_prometheusmetricsenabled: true
calico_felix_prometheusmetricsport: 9092
calico_datastore: "etcd"

dns_mode: coredns
enable_nodelocaldns: true
container_manager: containerd

Provisioning the cluster

From now on, you can run:

> ansible-playbook -i inventories/hetzner cluster.yml

And you should be off to the races! If you login to the master node, kubectl get nodes should return how you'd expect! Now is a good time to install the KubeSail agent!

Notes for Cloud-less Kubernetes

One aspect of the cloud_provider: external choice above is that we lose a few features:

  1. Kubernetes will not be able to determine the correct public IP address of the node
  2. Nodes will never be marked as ready for containers (and you might pull your hair out!)

Let's fix both:

Fixing a Node's ExternalIP field

Here is an example script which assigns a new External IP address to a node - this can be run to set the external addresses from Hetzner into Kubernetes.

NODE_NAME="worker-3"
NODE_EXTERNAL_IP="SOME_IP_ADDRESS_HERE"
NAMESPACE="kube-system"
SERVICEACCT="default"

K8STOKEN=$(kubectl -n ${NAMESPACE} get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='${SERVICEACCT}')].data.token}"|base64 --decode)

curl -k -v -XPATCH \
  -H "Accept: application/json" \
  --header "Authorization: Bearer ${K8STOKEN}" \
  -H "Content-Type: application/json-patch+json" \
  https://MY_KUBERNETES_SERVER:6443/api/v1/nodes/${NODE_NAME}/status \
  --data '[{"op":"add","path":"/status/addresses/-", "value": {"type": "ExternalIP", "address": "${NODE_EXTERNAL_IP}"} }]'

# To elevate a SA to cluster-admin in order to do the above: example:
kubectl -n kube-system create clusterrolebinding default-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default

# Cleanup the admin role after if you no longer need it:
kubectl -n kube-system delete clusterrolebinding default-admin

Marking an cloudless Node as ready

Remove the `node.cloudprovider.kubernetes.io/uninitialized taint:

> kubectl taint nodes worker-3 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-

Ingress system

Since we created the Hetzner LoadBalancer to point at port 444 of our hosts, we can install any Ingress system there. We suggest the Nginx Ingress controller (https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal). Modify the Deployment to use a hostPort:

  ports:
    - name: http
      containerPort: 80
      hostPort: 81
      protocol: TCP
    - name: https
      containerPort: 443
      hostPort: 444 # Remember, we cannot use hostPort: 443, as this is in-use by Kubernetes, and breaks things badly!
      protocol: TCP

Now traffic to your LoadBalancer should make it's way to your nginx ingress controller - and ingress is working properly!

Fix CPU frequency scaling

By default the Ubuntu 20.04 image on Hetzner uses a power-saving CPU profile. Newer Ryzen processors really benefit from the performance mode, so let's enable that (credit to davidjamesstockton's awesome article):

# Query scaling_governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# View and compare current frequency to the min and max scaling
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

# Set scaling_governor to 'performance'
find /sys/devices/system/cpu/ -maxdepth 1 -type d -name 'cpu[0-9]*' -exec bash -c 'echo performance > {}/cpufreq/scaling_governor' \;

Storage systems

We install and manage Rook/Ceph for our clients. On Hetzner, the NVMe disk is used for the entire system, and is mounted in a RAID 0 fashion. Installing and managing a Rook cluster is out of the scope of this tutorial - but for the sake of sysadmin helpfulness, something like the following can be used in the Hetzner Rescue console to shrink the primary partition to make room for a storage system partition:

# Check for health
> e2fsck -fy /dev/md2
# Shrink logical FS
> resize2fs /dev/md2 60G
# Check for health
> e2fsck -fy /dev/md2
# Resize raid array logical volume
> mdadm --grow --size 70G /dev/md2

# PER DRIVE
# fail first disk
> mdadm /dev/md2 --fail /dev/nvme1n1p3
# Status
> mdadm --detail /dev/md2
# remove first disk
> mdadm /dev/md2 --remove /dev/nvme1n1p3
# Use fdisk to resize partition
> fdisk /dev/nvme1n1 # (+80G, dont remove raid signature, set type of partition to 'fd')
# Re add partition to raid
> mdadm -a /dev/md2 /dev/nvme1n1p3
# Status (wait for rebuild)
> watch -n 3 mdadm --detail /dev/md2
# END PER DRIVE

# When done with all disks, max-size the filesystem
> resize2fs /dev/md2

Overview

Not so bad! But plenty of pieces you may need a hand with - that's why we're here! Reach out to us if you're interested in our managed services! Topics we cover with our managed service that aren't addressed here includes Storage, Monitoring, Backups, Metrics, Logs, and more. KubeSail can assist with any part of building and managing your clusters!

Thanks! Feel free to reach out on Discord or Twitter and let us know if you have any feedback!

Stay in the loop!

Join our Discord server, give us a shout on twitter, check out some of our GitHub repos and be sure to join our mailing list!