Reputation: 4021
I'm trying to understand why one of my containers in a pod is slower to start when started by the kubelet, than it is when started via the docker cli directly on the GKE node itself.
Here's the kubelet log. The container is started, and but stays in an unready state for 23 seconds:
18:49:55.000 Container image "eu.gcr.io/proj/ns/myimage@sha256:fff668" already present on machine
18:49:55.000 Created container
18:49:56.000 Started container
18:49:56.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:49:58.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:00.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:02.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:04.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:06.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:08.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:10.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:12.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:14.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:16.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
18:50:18.000 Readiness probe failed: cat: /tmp/healthy: No such file or directory
Finally the container actually really starts 23 seconds later. I know this because the very first thing it does is print the following log line, and then write the /tmp/healthy file for the readinessProbe.
18:50:18.000 17:50:18,572|MainThread|INFO|cli|Starting application
However, as the following command shows by printing the current date, and then starting the container with the docker cli (on the same node as the kubelet is running on above) it should only take ~1 second to start the container.
mark@gke-cluster-3 ~ $ date ++%Y-%m-%d %H:%M:%S.%N; docker run -it eu.gcr.io/proj/ns/myimage@sha256:fff668
2017-11-25 16:37:01.188799045
2017-11-25 16:37:02,246|MainThread|INFO|cli|Starting application
It's driving me a bit nuts! Any thoughts about what could be causing this welcomed :)
Upvotes: 2
Views: 621
Reputation: 4021
Turns out that the problem with the slow start up of these containers was constrained CPU for the Python interpreter during start up. I added a bash script that would print the datetime before starting the Python process and when varying the CPU resources available to the container, the problem becomes painfully clear.
cpu: 10m
2017-12-18 08:05:46,1513584346 starting script
2017-12-18 08:06:22,318|MainThread|INFO|cli|Application startup
cpu: 50m
2017-12-18 08:15:11,1513584911 starting script
2017-12-18 08:15:27,317|MainThread|INFO|cli|Application startup
cpu: 100m
2017-12-18 08:07:46,1513584466 starting script
2017-12-18 08:07:53,218|MainThread|INFO|cli|Application startup
cpu: 150m
2017-12-18 08:18:16,1513585096 starting script
2017-12-18 08:18:20,730|MainThread|INFO|cli|Application startup
cpu: 200m
2017-12-18 08:09:14,1513584554 starting script
2017-12-18 08:09:17,922|MainThread|INFO|cli|Application startup
It's a bit frustrating because the applications consume around 10m CPU at run time. I'm going to investigate module imports and other recommendations from here: https://lwn.net/Articles/730915/
Upvotes: 1