Does it make sense to run multiple similar processes in a container?

Question

a brief background to give context on the question.

Currently my team and i are in the midst of migrating our microservices to k8s to lessen the effort of having to maintain multiple deployment tools & pipelines.

One of the microservices that we are planning to migrate is an ETL worker that listens to messages on SQS and performs multi-stage processing.

It is built using PHP Laravel and we use supervisord to control how many processes to run on each worker instance on aws ec2. Each process basically executes a laravel command to poll different queues for new messages. We also periodically adjust the number of processes to maximize utilization of each instance's compute power.

So the questions are:

is this method of deployment still feasible when moving to k8s? Is there still a need to "maximize" compute usage? Are we better off just running 1 process in each container using the "container way" (not sure what is the tool called. runit?)

i read from multiple sources (e.g https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container) that it is ideal that for a container to run only 1 process. There's also the case of recovering crashed processes and how running supervisord might interfere with how container performs recovery. But i am not very sure if it applies for our use case.

David Maze · Accepted Answer

You should absolutely restructure this to run one process per container and one container per pod. You do not typically need an init system or a process manager like supervisord or runit (there is an argument to have a dedicated init like tini that can do the special pid-1 things).

You mention two concerns here, restarting failed processes and process placement in the cluster. For both of these, Kubernetes handles these automatically for you.

If the main process in a Pod fails, Kubernetes will restart it. You don't need to do anything for this. If it fails repeatedly, it will start delaying the restarts. This functionality only works if the main process fails – if your container's main process is a supervisor process, you will never get a pod restart and you may not directly notice if a process can't start up at all.

Typically you'll run containers via Deployments that have some number of identical replica Pods. Kubernetes itself takes responsibility for deciding which node will run each pod; you don't need to manually specify this. The smaller the pods are, the easier it is to place them. Since you're controlling the number of replicas of a pod, you also want to separate concerns like Web servers vs. queue workers so you can scale these independently.

Kubernetes has some ability to auto-scale, though the typical direction is to size the cluster based on the workload: in a cloud-oriented setup if you add a new pod that requests more CPUs than your cluster currently has available, it will provision a new node. The HorizonalPodAutoscaler is something of an advanced setup, but you can configure it so that the number of workers is a function of your queue length. Again, this works better if the only thing it's scaling is the worker pods, and not a collection of unrelated things packaged together.

Does it make sense to run multiple similar processes in a container?

Answers (1)

Related Questions