Previous: Quick Tricks: Downloading Files in Parallel with Curl and Parallel
Next: Migrating to 11ty
A Kubernetes Jouney
Published: Wed, 01 Feb 2023 00:00:00 GMT by David ChanSo, I realized that a bunch of the TIL's that I've posted on the Sister "blog" to this blog are based on my journey in Kubernetes (yes, my unused blog has another unused blog attached to it.... it's blogs all the way down) , so I thought that it might be interesting to sit down and write a bit of a longer piece about my 6 month process of transitioning all of my personal hosting infrastructure to Kubernetes.
What is Kubernetes
If anyone is visiting this page, they've probably heard the word "Kubernetes" or "K8s" thrown around. It's possible, though, that even though you've heard the term, you have no idea what it means. Kubernetes (or, since we love abbreviations in web development, K8s), is, in the terms of their homepage: "an open-source system for automating deployment, scaling, and management of containerized applications." In practice, for me it's a tool which helps to manage all of the infrastructure for anything I expose to the web. Famously used by Google to manage its deployment infrastructure, Kubernetes excels at managing huge deployments of containerized applications, and can scale to the size of most of the cloud giants. It's an impressive piece of work. For more information on what exactly Kubernetes is, you can check out the Google cloud page but, actually, while googling this, I didn't find a great resource for people who aren't deep into devops. I suspect this belies the complexity of getting this whole thing working – perhaps I should write something. But that's a blog post for another day.
Why on earth would you use Kubernetes for a personal project?
The first thing that anyone who has ever used Kubernetes before will say is "Why would you ever want to use this for a personal project?" Kubernetes is notorious for it's finicky deployments, impossible configuration files, dense documentation, steep learning curve, and complicated infrastructure.
Well, you see, I have a lot of personal projects. Currently, I manage over 20 different applications which are exposed to the web. One of these is this blog that you're reading right now. Another one is the sister project to this blog, til.dchan.cc which is a collection of much shorter "Today I learned" style posts. I run several services which backend data collection for my PhD. I run a system which automates checking each of the 15 servers that I manage for the lab at UC Berkeley for SSH accessibility (and makes that available in real-time on statuspage.io). I run DDNS servers with healthchecks.io (That's a whole different blog post). I run a web service that updates me every day with the latest papers published in my field. And so so so many more things. At a certain point, running each of these apps in a loose collection of VPS instances from sketchy VPS providers became dangerous, expensive, and it became super easy to forget that something was running (or where something was running), or even perhaps how much something was costing me to run. So I decided to move from this loose collection to a tight ship of dockerized containers.
Towards Kubernetes Everywhere
So, now that I've made this decision, what exactly did it take? What exactly was the price I paid...
Step 1: Dockerizing Every Application
So, the first problem that I had was that almost none of my projects were built into containers. The container infrastructure was a revolution in devops - you could package up all of your code and dependencies into a package, and deploy this package on pretty much anything, pretty much anywhere, with one command. For somebody who is running deployments, this is amazing, but for somebody building personal projects, it is a nightmare of additional complexity. It took probably a few long weekends to get every one of my personal projects to run on docker.
Step 2: Finding a Container Registry
Next, I needed somewhere to host all of these docker files. Because most of the files are for personal projects, that aren't open source, I didn't necessarily want to publish them to a public container registry like Docker hub. For me ghcr.io came to the rescue: It's tight integration with Github (which I already use to store all of my personal code), and the relatively relaxed pricing structure gave me an excellent place to start (and for public repos, it's free!). Fantastic. Now, I had done most of the hard work, I thought. I have all of my code running in containers, and uploaded to a container repository somewhere. All I have to do is create a cluster and press go, right? Right?
Step 3: Setting up the control plane
So Kubernetes needs a couple of things to run. It needs worker nodes which run each of your individual pods (read: servers that run containers), and it needs a server which organizes all of the work that the servers do. Because I'm nothing but cheap, I thought to myself "Ok, maybe I can provision a few VPS instances (at least three, they say in the docs) using my cheap (and sketchy) online VPS providers, and get this working". Turns out, that really doesn't work. I was about six hours deep into reading about how to link Kubernetes nodes together over the internet before I realized that this was a lost cause. Turns out, that Kubernetes is designed to run between nodes which live in a firewall-protected VPS. And getting around that is a huge no. So I did the numbers, and I caved, and set up the cluster through linode (I was influenced by all of the podcasts. So many podcasts). Each node in the cluster cost $10/month, and you needed at least three, so we're at 30$/month already. Yes, I know, not that much.... but I'm a PhD student. That's the cost of Dinner out in the bay area, for one person. I guess more hot dogs on rice for me. Ok, so now I had the control plane, just a few lines of YAML to get my first app up and running to the public right?
Step 4: My First Deployment (And Regret)
So, the first thing that I decided to do was to get a simple static page running, the landing page for a site I haven't yet built: starlight.dev. All I needed to do was serve a static file, so I had created a docker container with nginx, and I needed to get this exposed to the internet. So I followed a tutorial, and created a deployment. And nothing happened. So I added some more YAML code to make sure that my cluster was authorized to pull containers from ghcr. Well, that kinda worked. Now, I could see my pod running in the cluster (woohoo!), but I couldn't access it from the outside world. Or really, from the inside world. Or really, from anywhere. Then I read another tutorial, and I wrote some more YAML, and I had a service. But I still couldn't access it. And then I read another tutorial. And I realized that actually, I needed an "Ingress service", and without really knowing what that was, I somehow managed to get traefik installed on the cluster, and then finally, I still couldn't access anything. Not only that, but ingress services cost money, so we were up to $40/month, and I still couldn't even see my static page. After another few tutorials, I finally created an IngressRoute which apparently talks to Traefik, and Finally, finally something happened. I saw a static page. On the internet. After at least 30 hours of messing around with it. Turns out, the Kubernetes learning curve is no joke. And honestly, at that point I still didn't really understand what I had done. It was just a huge collection of YAML files I had applied (to the default namespace no less) to the cluster, and I had no idea what I was doing for real. To this day, I'm still not really sure that I know what I'm doing, but hey, maybe I'll figure it out one day.
Step 5: SSL
Well, there was only one little problem with my static web page. It had the red lock of doom (i.e. no SSL), so the next step was to figure out how to get my cluster to hand out the right SSL keys. Since Traefik was terminating the SSL connections, I had to figure out how to give Traefik the certs (from Cloudflare, which handles my DNS, in strict mode), that it needed to prove that yes, indeed, I am David (or at least, that my sites are trusted by me). This required a new thing - cert-manager, which, after a bunch of fiddling around with YAML files, I managed to get deployed in my cluster. But cert-manager only handles the soring fo the certificates, I still needed to generate them. Luckily, there was this tool origin-ca-issuer from Cloudflare which is supposed to interface between Cloudflare and my cluster, to get the origin certificates sorted out correctly. There were installation instructions, and boy did I run them. Like two or three times. But each time, the docker container would crash. Turns out, it just needed more memory. Ah. Good times. But it finally worked. And I got the green lock. But yeah, another set of hours sunk into something that I have not (to this day), touched in my cluster. Sometimes I worry that if something dies, I won't be able to get all of the origin certs working again.... but that's a problem for the David of the future.
Step 6: More Deployments
I'll skip over the hours sunk into each of my other deployments. From not adding the -n
to echo
when encoding secrets in base64, to figuring out how to attach Linode volumes (turns out, Linode block storage doesn't support ReadWriteMany
), to having to pay at least $1/month for each 10GB block storage (since even though my app only needs 150Mb of persistent storage, I still have to attach a volume), to getting MySQL running since Ghost needs that... to getting OOM errors for my pods, and installing metrics-server to track that. Man, it was a journey. But honestly, at this point, it was starting to feel a bit smoother. I (somewhat) understood how Kubernetes was working at this point, and I was starting to get the muscle memory required to write the YAML files which became a huge chunk of my infrastructure. I started using namespaces for each of my projects, and I started storing all of the YAML files in a GitHub repository, so that the entire cluster would be reproducible if something failed... I'm still not sure that it is, but since Linode's control plane hasn't failed yet, it's not something that I've had to worry about. It also became much easier over time to add new deployments. It's no longer a pain to worry about the infrastructure for a new side project, all I have to do is just use a cookie-cutter template to create a new deployment/service, and it's all up and running pretty quickly. Exposed to the internet. With SSL. All with one or two commands. And I get some redundancy built in (but honestly, none of my projects are popular enough to need it).
Step 7: Cron Jobs
I have a bunch of things that run on their own, and honestly, this was one of the easiest things to get running. All I had to do was to create a YAML definition of the job, add a schedule, and it would pull and run the image every time it was scheduled. This was, by far, the easiest part of getting this whole thing to work, and to be honest, I would start here for anyone learning how Kubernetes works, or for anyone who wants to get their feet wet. Getting a CronJob running on Kubernetes is rewarding, and frankly, not that hard. Certainly beats SSL. And Reverse Proxies. And literally. Everything. Else.
Step 8: Figuring out how to get DNS to work
So, at this point, I had everything working, but every time I created a deployment, I needed to manually create the DNS entries in cloudflare. Hunh, I thought. I can do this automatically! No need to handle anything manually! I am a Kubernetes master! I can do this myself. So I found this tool external-dns, which did a great job of handling the DNS. Or so it claimed. Well, you see, because I was using Traefik IngressRoutes, there wasn't any built-in support. Oh no. I tried everything to get it working, but no dice. Finally, I stumbled across a tiny footnote in the docs – the custom resource directory, and the source code for the crd-source. By registering a CRD, and uploading it to the cluster, and pairing it with a working external-dns installation, I was finally able to create DNS Endpoints in YAML, and finally, my Kubernetes cluster was complete!
So... there's this thing called "helm"
So, up until this point, I had been building out my cluster *entirely from scratch. *This means that I wasn't using helm, or rancher, or any of those handy tools that actually make Kubernetes usable. What does this mean? To run installs, it was a combination of cloning source code, building libraries, and running kubectl apply -f
a lot. Like a lot. But after setting all of this up, I realized that there's this tool called helm, which makes a lot of these things easier. Oh yeah, and there's a tool called rancher which makes this easier too. Oops. But I guess, learning experience?
So, is it worth it?
Now that I've been living with my (personal) Kubernetes cluster for a few months, I can honestly say... it's not worth it at all. The pain of setting things up, and getting the whole thing deployed, along with the costs that are associated with it (since all of the companies that have the useful tools assume that you're using it for corporate purposes – what even is $10/month for each ingress??), means that it's in no way worth it for a single user. Really, just get a beefy computer, and run docker swarm or something. So much easier.
But.
I love it. I love my new Kubernetes cluster. Not only is it so much easier to manage all of my personal applications, it feels like a pet that I've been tending to for several months. It's really fulfilling to say to myself - I can manage all of these projects using the best of the best tools that's out there. And frankly, the fact that in less than 30 seconds I can provision DNS, SSL, Hardware infrastructure, Routing, and Databases for a new application using a cookiecutter template is amazing. It really allows me to focus the time that I would have put into managing all of my code into new projects, and worry about what color the font on my home page should be. It's great.
What's Next?
So, what's next? I'll probably continue using this cluster for a while. It's a great tool, and it really helps amp up my productivity. As I find more things to run, I'll probably start figuring out how to scale the ingresses, and scale the underlying cluster... it'll probably break this blog at some point. But for now, my Kubernetes journey has reached a plateau. I'll end with one last meme, which I think sums everything up. Cheers! David