Best practices managing docker nodes and swarms company-wide

Question

I had hard time to get the title right, but here is the problem:

Let's imagine I'm managing multiple projects hosted in docker, each project running in swarm across x nodes. Different projects range from tens to tens of thousands requests per second and demand could increase very fast for any given project.

Should I create new swarm and nodes (azure vms) for each project and scale them accordingly based on usage, which would result a large amount of small-to-large vms.

Or should I have a lot smaller pool of larger vms running maybe only one swarm which handles all the services. I'm fairly sure this would be more optimal solution because you lose the overhead of running the vm and also eliminates the vms that just stay on doing nothing because the service is currently not popular.

When you think of the pricing, cpu/ram scales linearly so there is no difference in having 1x 4 core machine vs 4x 1 core machine (unless you require large disk).

I have also had problems with vms that have miniscule amount of memory (1gb) because sometimes some random process eats all the memory and the machine is basically dead. Your load-balanced service might not need a lot of memory but you still need a lot of nodes to ensure reliability (os overhead problem with micro services).

One large swarm with large nodes makes so much sense performance/optimization wise, but I'm worried about the reliability. I know docker containers cannot have access to other containers or host data, but how about the swarm? Is it possible that one service floods/crashes the whole node or even the whole swarm and a nightmare ensues when all company's services are down.

Best practices managing docker nodes and swarms company-wide

Answers (1)

Related Questions