Update

When I went through this initially, I didn't realise I was setting up a high-availability cluster meant just to run Rancher... which no-one needs. So instead I checked out the guide on running a single node setup, and provisioned my Kubernetes cluster through the Rancher CLI. The way it was intended.

On the off-chance someone's interested in how I setup a high-availability cluster running Rancher, here goes.


I don’t know enough about Kubernetes or Rancher to create a guide, but here’s how I set them up so I could have a cluster of machines that can run Docker containers, for a growing project.

If you follow this and you come up against problems, check the guides listed in the Credits section, as I’m probably not knowledgeable enough to help.

Background

In March I wrote about Project Anvil, an attempt to create a Dockerised version of my web product. I've gone back and forth and round the houses with this project, and have come to the conclusion that the best thing to do is to keep the app as close to the current production version as possible. I've been making big changes to other parts of the system, and even though I'm still splitting a few things out into separate repos – which I think is the best course – I don't want to go the full SPA route.

So the first aim is to get the app to be deployed on a stack running Kubernetes (a system I don't really understand) via Rancher (an open source manager I've never used).

As I said up top, this is definitely not a guide, but a rundown of the steps I've taken so far that have lead to a seemingly stable Rancher installation.

Provisioning the nodes and load balancer

I started by creating 4 DigitalOcean droplets, using the following naming scheme:

  • universe-node0
  • universe-node1
  • universe-node2
  • universe-node3

Then I SSHed into universe-node0 and ran the following to install NginX:

$ apt-get update && apt-get -y install nginx

I then SSHed into the remaining nodes and ran the following to install Docker:

$ apt-get remove -y docker docker-engine docker.io
$ apt-get update && apt-get install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ apt-get update && apt-get install -y docker-ce=17.03.2~ce-0~ubuntu-xenial

Configuring NginX

On universe-node0, I replaced /etc/nginx/nginx.conf with the following:

worker_processes 4;
worker_rlimit_nofile 40000;

events {
    worker_connections 8192;
}

http {
    server {
        listen 80;
        return 301 https://$host$request_uri;
    }
}

stream {
    upstream rancher_servers {
        least_conn;
        server <node1 ip>:443 max_fails=3 fail_timeout=5s;
        server <node2 ip>:443 max_fails=3 fail_timeout=5s;
        server <node3 ip>:443 max_fails=3 fail_timeout=5s;
    }

    server {
        listen 443;
        proxy_pass rancher_servers;
    }
}

I restarted NginX by running service nginx restart.

Setting up the FQDN

I added a DNS record pointing the IP address of the load balancer node to rancher.example.com (obviously I've redacted the domain name).

Installing RKE

I download the Rancher Kubernetes installer, and in a terminal, entered the directory the binary was downloaded to and ran chmod +x rke_darwin-amd64 to make the binary executable. I verified RKE was working by running ./rke_darwin-amd64 --version. I then made it executable from any location by running mv rke_darwin-amd64 /usr/local/bin/rke.

Configuring Rancher

I created a file called rancher-cluster.yml, using the following template:

nodes:
  - address: <node1 ip>
    user: root
    role: [controlplane,etcd,worker]
    ssh_key_path: <node1 ssh key path>
  - address: <node2 ip>
    user: root
    role: [controlplane,etcd,worker]
    ssh_key_path: <node1 ssh key path>
  - address: <node3 ip>
    user: root
    role: [controlplane,etcd,worker]
    ssh_key_path: <node1 ssh key path>

addons: |-
  ---
  kind: Namespace
  apiVersion: v1
  metadata:
    name: cattle-system
  ---
  kind: ServiceAccount
  apiVersion: v1
  metadata:
    name: cattle-admin
    namespace: cattle-system
  ---
  kind: ClusterRoleBinding
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: cattle-crb
    namespace: cattle-system
  subjects:
  - kind: ServiceAccount
    name: cattle-admin
    namespace: cattle-system
  roleRef:
    kind: ClusterRole
    name: cluster-admin
    apiGroup: rbac.authorization.k8s.io
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: cattle-keys-ingress
    namespace: cattle-system
  type: Opaque
  data:
    tls.crt: <tls crt>
    tls.key: <tls key>
  ---
  apiVersion: v1
  kind: Service
  metadata:
    namespace: cattle-system
    name: cattle-service
    labels:
      app: cattle
  spec:
    ports:
    - port: 80
      targetPort: 80
      protocol: TCP
      name: http
    - port: 443
      targetPort: 443
      protocol: TCP
      name: https
    selector:
      app: cattle
  ---
  apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    namespace: cattle-system
    name: cattle-ingress-http
    annotations:
      nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
  spec:
    rules:
    - host: rancher.example.com
      http:
        paths:
        - backend:
            serviceName: cattle-service
            servicePort: 80
    tls:
    - secretName: cattle-keys-ingress
      hosts:
      - rancher.example.com
  ---
  kind: Deployment
  apiVersion: extensions/v1beta1
  metadata:
    namespace: cattle-system
    name: cattle
  spec:
    replicas: 1
    template:
      metadata:
        labels:
          app: cattle
      spec:
        serviceAccountName: cattle-admin
        containers:
        - image: rancher/rancher:latest
          args:
          - --no-cacerts
          imagePullPolicy: Always
          name: cattle-server
          ports:
          - containerPort: 80
            protocol: TCP
          - containerPort: 443
            protocol: TCP

Getting an SSL certificate for the Rancher instance

I SSHed into the land balancer and ran the following to install the Let’s Encrypt certbot, so I could obtain an SSL certificate for my Rancher installation:

$ apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:certbot/certbot
$ apt-get update && apt-get install -y python-certbot-nginx 

I stopped NginX from running so I could spin up a temporary webserver:

$ service nginx stop

I then ran certbot certonly and when prompted, selected option 2 to spin up that temporary webserver.

I was then able to start the NginX server back up:

$ service nginx start

Once the certificate was registered, I had two files:

  • /etc/letsencrypt/live/rancher.example.com/fullchain.pem
  • /etc/letsencrypt/live/rancher.example.com/privkey.pem

I base64-encoded the certificate:

$ echo $(base64 /etc/letsencrypt/live/rancher.example.com/fullchain.pem)

I then pasted the value into the tls.crt key (line 49) of the rancher-cluster.yml file, remembering to remove any whitespace and line-breaks so it was pasted as a single, unbroken string (not a YAML multi-line string).

I base64-encoded the certificate key:

$ echo $(base64 /etc/letsencrypt/live/rancher.example.com/privkey.pem)

I then pasted that value into the tls.key value (line 50) of the rancher-cluster.yml file, in the same manner as before.

I ran the following to provision the nodes:

$ rke up —config <path to rancher-cluster.yml>

After a while I ended up with an output that ended like this:

INFO[0021] [addons] Executing deploy job..
INFO[0031] [addons] User addons deployed successfully
INFO[0031] Finished building Kubernetes cluster successfully

It failed the first few times for me (succeeded with warnings), because I’d stuffed up the certificate strings.

Once setup, I was able to pop my domain name into my browser and see my Rancher instance, and setup my password.

Credits