MarkNS
MarkNS

Reputation: 4021

Why is the kubelet slower than docker cli to start a container?

I'm trying to understand why one of my containers in a pod is slower to start when started by the kubelet, than it is when started via the docker cli directly on the GKE node itself.

Here's the kubelet log. The container is started, and but stays in an unready state for 23 seconds:

18:49:55.000 Container image "eu.gcr.io/proj/ns/myimage@sha256:fff668" already present on machine
18:49:55.000 Created container
18:49:56.000 Started container
18:49:56.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:49:58.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:00.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:02.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:04.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:06.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:08.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:10.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:12.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:14.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:16.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:18.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory

Finally the container actually really starts 23 seconds later. I know this because the very first thing it does is print the following log line, and then write the /tmp/healthy file for the readinessProbe.

18:50:18.000 17:50:18,572|MainThread|INFO|cli|Starting application 

However, as the following command shows by printing the current date, and then starting the container with the docker cli (on the same node as the kubelet is running on above) it should only take ~1 second to start the container.

mark@gke-cluster-3 ~ $ date ++%Y-%m-%d %H:%M:%S.%N; docker run -it eu.gcr.io/proj/ns/myimage@sha256:fff668
2017-11-25 16:37:01.188799045
2017-11-25 16:37:02,246|MainThread|INFO|cli|Starting application

It's driving me a bit nuts! Any thoughts about what could be causing this welcomed :)

Upvotes: 2

Views: 621

Answers (1)

MarkNS
MarkNS

Reputation: 4021

Turns out that the problem with the slow start up of these containers was constrained CPU for the Python interpreter during start up. I added a bash script that would print the datetime before starting the Python process and when varying the CPU resources available to the container, the problem becomes painfully clear.

cpu: 10m

2017-12-18 08:05:46,1513584346 starting script
2017-12-18 08:06:22,318|MainThread|INFO|cli|Application startup

cpu: 50m

2017-12-18 08:15:11,1513584911 starting script
2017-12-18 08:15:27,317|MainThread|INFO|cli|Application startup

cpu: 100m

2017-12-18 08:07:46,1513584466 starting script
2017-12-18 08:07:53,218|MainThread|INFO|cli|Application startup

cpu: 150m

2017-12-18 08:18:16,1513585096 starting script
2017-12-18 08:18:20,730|MainThread|INFO|cli|Application startup

cpu: 200m

2017-12-18 08:09:14,1513584554 starting script
2017-12-18 08:09:17,922|MainThread|INFO|cli|Application startup

It's a bit frustrating because the applications consume around 10m CPU at run time. I'm going to investigate module imports and other recommendations from here: https://lwn.net/Articles/730915/

Upvotes: 1

Related Questions