Reputation: 595
On GKE, I tried to use readiness probe/ liveness probe , and post alert using monitoring https://cloud.google.com/monitoring/alerts/using-alerting-ui
as a test, I create a pod which has readiness probe/ liveness probe. Probe check failed everytime, as I expected.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
readinessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 30
successThreshold: 1
failureThreshold: 3
And checking GCP log, both error logs showed up based on periodSeconds first.
Readiness probe: every 10 seconds
2021-02-21 13:26:30.000 JST Readiness probe failed: HTTP probe failed with statuscode: 500
2021-02-21 13:26:40.000 JST Readiness probe failed: HTTP probe failed with statuscode: 500
Liveness probe: every 1 minute
2021-02-21 13:25:40.000 JST Liveness probe failed: HTTP probe failed with statuscode: 500
2021-02-21 13:26:40.000 JST Liveness probe failed: HTTP probe failed with statuscode: 500
But, after running this pod several minutes
$ kubectl get event
LAST SEEN TYPE REASON OBJECT MESSAGE
30m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
25m Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
20m Warning BackOff pod/liveness-http Back-off restarting failed container
20m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to gke-cluster-default-pool-8bc9c75c-rfgc
17m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
17m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness"
17m Normal Created pod/liveness-http Created container liveness
20m Normal Started pod/liveness-http Started container liveness
4m59s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
17m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
17m Normal Killing pod/liveness-http Container liveness failed liveness probe, will be restarted
In my plan, I would create alert policy, whose condition is like
but if probe check didn't called as I expect, these policy didn't work; even if pod is not running, alert became fixed
Why Liveness probe didn't run, and interval of Readiness probe changed ?
Note: if there are other good alert policy to check liveness of pod, I would not care that behavior. I appreciate it if someone would advice me what kind of alert policy is ideal to check pod.
Upvotes: 2
Views: 18069
Reputation: 14102
In Configure Liveness, Readiness and Startup Probes documentation you can find information:
The kubelet uses
liveness probes
to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.The kubelet uses
readiness probes
to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
As GKE
master is managed by google, you won't find kubelet
logs using CLI
(you might try to use Stackdriver
). I've tested it on the Kubeadm
cluster and set the verbosity
level to 8
.
When you are using $ kubectl get events
you are getting events only from the last hour (it can be changed in Kubernetes settings - Kubeadm
but I don't think it can be changed in GKE
as master is managed by google.)
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
37m Normal Starting node/kubeadm Starting kubelet.
...
33m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to kubeadm
33m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
33m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness" in 893.953679ms
33m Normal Created pod/liveness-http Created container liveness
33m Normal Started pod/liveness-http Started container liveness
3m12s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
30m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
8m17s Warning BackOff pod/liveness-http Back-off restarting failed container
Again the same command after ~1 hour
.
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
33s Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
5m40s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
15m Warning BackOff pod/liveness-http Back-off restarting failed container
The Readiness Probe
check is executed each 10 seconds for more than one hour.
Mar 09 14:48:34 kubeadm kubelet[3855]: I0309 14:48:34.222085 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:44 kubeadm kubelet[3855]: I0309 14:48:44.221782 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:54 kubeadm kubelet[3855]: I0309 14:48:54.221828 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:01:34 kubeadm kubelet[3855]: I0309 15:01:34.222491 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4
562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:44 kubeadm kubelet[3855]: I0309 15:01:44.221877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:54 kubeadm kubelet[3855]: I0309 15:01:54.221976 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:10:14 kubeadm kubelet[3855]: I0309 15:10:14.222163 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:24 kubeadm kubelet[3855]: I0309 15:10:24.221744 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:34 kubeadm kubelet[3855]: I0309 15:10:34.223877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 16:04:14 kubeadm kubelet[3855]: I0309 16:04:14.222853 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:04:24 kubeadm kubelet[3855]: I0309 16:04:24.222531 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Also, there are Liveness probe
entries.
Mar 09 16:12:58 kubeadm kubelet[3855]: I0309 16:12:58.462878 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:13:58 kubeadm kubelet[3855]: I0309 16:13:58.462906 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465587 3855 kuberuntime_manager.go:712] Killing unwanted container "liveness"(id={"docker" "95567f85708ffac8b34b6c6f2bdb49d8eb57e7704b7b416083c7f296dd40cd0b"}) for pod "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a)"
Total time of tests:
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
liveness-http 0/1 Running 21 99m
liveness-http 0/1 CrashLoopBackOff 21 101m
liveness-http 0/1 Running 22 106m
liveness-http 1/1 Running 22 106m
liveness-http 0/1 Running 22 106m
liveness-http 0/1 Running 23 109m
liveness-http 1/1 Running 23 109m
liveness-http 0/1 Running 23 109m
liveness-http 0/1 CrashLoopBackOff 23 112m
liveness-http 0/1 Running 24 117m
liveness-http 1/1 Running 24 117m
liveness-http 0/1 Running 24 117m
Liveness probe check didn't not called anymore
Liveness check
is created when Kubernetes create pod and is recreated each time that Pod
is restarted. In your configuration you have set initialDelaySeconds: 20
so after creating a pod
, Kubernetes will wait 20 seconds, then it will call liveness
probe 3 times (as you have set failureThreshold: 3
). After 3 fails, Kubernetes will restart this pod according to RestartPolicy
. Also in logs you will be able to find in logs:
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
Why will it be restarted? Answer can be found in Container probes.
livenessProbe:
Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.
Default Restart Policy in GKE
is Always
. So your pod will be restarted over and over again.
Readiness probe check called but interval became long ( maximum interval looks about 10 minutes)
I think you've come to that conclusion as you have based on $ kubectl get events
and $ kubectl describe po
. In both cases, events as default are removed after 1 hour. In my Tests
part you can see that Readiness probe
entries are from 14:48:34
till 16:04:24
, so each 10 seconds Kubernetes calls Readiness Probe
.
Why Liveness probe didn't run, and interval of Readiness probe changed?
As I show you in the Tests
part, the Readiness probe
didn't change. Misleading in this case was using $ kubectl events
. Regarding Liveiness Probe
it's still calling but only 3 times after pod will be created
/restarted
. Also I've included output of $ kubectl get po -w
. When pod
is recreated, you might find in kubelet logs those liveness probes
.
In my plan, I would create alert policy, whose condition is like:
- if liveness probe error happens 3 times in 3 minutes
If liveness probe
fails 3 times, with your current setup it will restart this pod. In that situation you could use each restart
to create an alert
.
Metric: kubernetes.io/container/restart_count
Resource type: k8s_container
Some useful information you can find in Stackoverflow cases regarding Monitoring alert
like:
Upvotes: 2