Reputation: 12352
Is it possible to fake a container to always be ready/live in kubernetes so that kubernetes thinks that the container is live and doesn't try to kill/recreate the container? I am looking for a quick and hacky solution, preferably.
Upvotes: 11
Views: 8683
Reputation: 11346
Liveness and Readiness probes are not required by k8s controllers, you can simply remove them and your containers will be always live/ready.
If you want the hacky approach anyways, use the exec
probe (instead of httpGet
) with something dummy that always returns 0
as exit code. For example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
livenessProbe:
exec:
command:
- touch
- /tmp/healthy
readinessProbe:
exec:
command:
- touch
- /tmp/healthy
Please note that this will turn into ready/live state only in case you did not specify the flag readOnlyRootFilesystem: true
.
Upvotes: 17
Reputation: 5075
I'd like to add background contextual information about why / how this can be useful to real world applications.
Also by pointing out some additional info about why this question is useful I can come up with an even better answer.
First off why might you want to implement a fake startup / readiness / liveness probe?
Let's say you have a custom containerized application, you're in a rush so you go live without any liveness or readiness probes.
Scenario 1:
You have a deployment with 1 replica, but you notice that whenever you go to update your app (push a new version via a rolling update), your monitoring platform reports occasionally 400, 500, and timeout errors during the rolling update. Post update you're at 1 replica and the errors go away.
Scenario 2:
You have enough traffic to warrant autoscaling and multiple replicas. You consistently get 1-3% errors, and 97% success.
Why are you getting errors in both scenarios?
Let's say it takes 1 minute to finish booting up / be ready to receive traffic. If you don't have readiness probes then newly spawned instances of your container will receive traffic before they've finished booting up / become ready to receive traffic. So the newly spawned instances are probably causing temporary 400, 500, and timeout errors.
How to fix:
You can fix the occasional errors in Scenario 1 and 2 by adding a readiness probe with an initialDelaySeconds (or startup probe), basically something that waits long enough for your container app to finish booting up.
Now the correct and proper best practice thing to do is to write a /health endpoint that properly reflects the health of your app. But writing an accurate healthcheck endpoint can take time. In many cases you can get the same end result (make the errors go away), without the effort of creating a /health endpoint by faking it, and just adding a wait period that waits for your app to finish booting up before sending traffic to it. (again /health is best practice, but for the ain't nobody got time for that crowd, faking it can be a good enough stop gap solution)
Below is a better version of a fake readiness probe:
Also here's why it's better
apiVersion: apps/v1
kind: Deployment
metadata:
name: useful-hack
labels:
app: always-true-tcp-probe
spec:
replicas: 1
strategy:
type: Recreate #dev env fast feedback loop optimized value, don't use in prod
selector:
matchLabels:
app: always-true-tcp-probe
template:
metadata:
labels:
app: always-true-tcp-probe
spec:
containers:
- name: nginx
image: nginx:1.7.9
startupProbe:
tcpSocket:
host: 127.0.0.1 #Since kubelet does the probes, this is node's localhost, not pod's localhost
port: 10250 #worker node kubelet listening port
successThreshold: 1
failureThreshold: 2
initialDelaySeconds: 60 #wait 60 sec before starting the probe
Additional Notes:
startupProbe:
httpGet:
host: google.com #default's to pod IP
path: /
port: 80
scheme: HTTP
successThreshold: 1
failureThreshold: 2
initialDelaySeconds: 60
---
startupProbe:
tcpSocket:
host: 1.1.1.1 #CloudFlare
port: 53 #DNS
successThreshold: 1
failureThreshold: 2
initialDelaySeconds: 60
The interesting bit:
Remember I said "httpGet, tcpSocket, and grcp liveness probes are done from the perspective of the node running kubelet (the kubernetes agent)." Kubelet runs on the worker node's host OS, which is configured for upstream DNS, in other words it doesn't have access to inner cluster DNS entries that kubedns is aware of. So you can't specify Kubernetes service names in these probes.
Additionally Kubernetes Service IPs won't work for the probes either since they're VIPs (Virtual IPs) that only* exist in iptables (*most cases).
Upvotes: 1