Reputation: 2045
We are running our prod DBs within Docker which works out good.
Now we are going into managed K8s and putting eg elasticsearch into it which does not feel good at all. After the issues with the volumes were solved (with PersistentVolumeClaimTemplates) clustering hit us hard. The nodes of the cluster simply do not find each other (after hours of fiddling around with using a headless service in the elasticsearch configs).
So, I am guessing that it is not very wise to do that and we should keep DBs outside the K8s cluster on VMs managed eg by Ansible.
What is your opinion about this?
Upvotes: 1
Views: 187
Reputation: 3759
Personally, I prefer to keep as much important state as possible outside of Kubernetes (k8s) or any other Container Orchestration Framework (referring to it as COF from here on) and most people I asked about this topic felt the same. In the end, COFs are software which dynamically manage your containers (and their dedicated drives if you must keep state ..). While this is very cool for stateless components, I do not feel easy about it when it comes to important state. The dynamic of COFs is achieved through an extra layer of complexity and I don't want extra complexity managing important state, as more complexity also means more bug surface. In contrast to configuration management tools like Ansible or SaltStack, which run in a controlled fashion on times that you decide, COF algorithms run independently all the time and can make decisions which might affect your database containers and drives too. This means that a bug in your COF configuration or inside the COF algorithm itself might have severe consequences at any time when you might not be prepared for it. Do I need that dynamic in my critical data layer? Separate machines with a controlled configuration management feel more reliable and simpler here.
Concerning k8s, another point is when you run self-managed clusters. Upgrading the production cluster manually is quite an experience and it feels way more secure if you cannot destroy your whole state there in a worst-case scenario.
In the end there is also a clash of philosophies here. I think that ideally, containers should be completely stateless and disposable, which is the complete opposite of the purpose of a database. Of course we do not live in an ideal world and sooner or later you reach the point where you have to keep some amount of state in your containers to make it work. We are offered to mount persistent volumes then and I think for non-critical data this is a good compromise. But should critical data be managed by something which was primarily designed for stateless concepts, even though it offers now ways for managing state too? Opinions differ here, but I'd say no.
That being said, in our current project we are still running ES clusters in k8s in production and never experienced severe issues or data loss. We use the ES clusters for log/metric data and other non-critical data that could easily be re-imported in case of total failure. As ES offers easy replication and scaling, it does not feel completely wrong to use it inside k8s for non-critical data, if you keep the replication factor high. Strict master-slave databases like Postgres on the other hand I wouldn't use inside k8s in a production environment. We use Postgres containers in our k8s test clusters to save cost, but in production we use managed DBs outside of k8s. Also, we run Redis master instances inside k8s, but we use them for caching purposes only - so again no critical state contained there.
Upvotes: 2
Reputation: 22884
Some of my clusters come from as early as Kubernetes 1.2-alpha, back then it was obvious that the really statefull services (MySQL Galera cluster in my case was the primary one) need to be kept outside of kube cluster. That did not change for me much, even with 1.8 installed, my DB is still external. But it is also large and separated (makes sense to have just mysql on each of the hosts) neither would I use k8s features to upgrade it or limit resources.
This is in my opinion still a perfectly viable option, specialy for large data stores that make sense to be isolated / reserve full node capacity.
On the other habd, if you have a wordpress blog to deploy, it can be perfectly reasonable to have the db for it as a part of it's helm chart. Even in the case above, while prod has separate DB, stage and dev envs have a --set devdb.enabled=true
ability that brigs up the database inside the kube cluster instead of connecting to an external one. Another example I have is prometheus, which I fully deploy on kubernetes. Although in both cases I did not have to struggle with clustering.
The bottom line is, that what suits your case best is the right solution for you :)
Upvotes: 2