Reputation: 765
When running a deployment I get downtime. Requests failing after a variable amount of time (20-40 seconds).
The readiness check for the entry container fails when the preStop sends SIGUSR1, waits for 31 seconds, then sends SIGTERM. In that timeframe the pod should be removed from the service as the readiness check is set to fail after 2 failed attempts with 5 second intervals.
How can I see the events for pods being added and removed from the service to find out what's causing this?
And events around the readiness checks themselves?
I use Google Container Engine version 1.2.2 and use GCE's network load balancer.
service:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: https
port: 443
targetPort: https
protocol: TCP
selector:
app: myapp
deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
revisionHistoryLimit: 10
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
version: 1.0.0-61--66-6
spec:
containers:
- name: myapp
image: ****
resources:
limits:
cpu: 100m
memory: 250Mi
requests:
cpu: 10m
memory: 125Mi
ports:
- name: http-direct
containerPort: 5000
livenessProbe:
httpGet:
path: /status
port: 5000
initialDelaySeconds: 30
timeoutSeconds: 1
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sleep 31;"]
- name: haproxy
image: travix/haproxy:1.6.2-r0
imagePullPolicy: Always
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 25Mi
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
env:
- name: "SSL_CERTIFICATE_NAME"
value: "ssl.pem"
- name: "OFFLOAD_TO_PORT"
value: "5000"
- name: "HEALT_CHECK_PATH"
value: "/status"
volumeMounts:
- name: ssl-certificate
mountPath: /etc/ssl/private
livenessProbe:
httpGet:
path: /status
port: 443
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /readiness
port: 81
initialDelaySeconds: 0
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["kill -USR1 1; sleep 31; kill 1"]
volumes:
- name: ssl-certificate
secret:
secretName: ssl-c324c2a587ee-20160331
Upvotes: 0
Views: 695
Reputation: 2835
When the probe fails, the prober will emit a warning event with reason as Unhealthy
and message as xx probe errored: xxx
.
You should be able to find those events using either kubectl get events
or kubectl describe pods -l app=myapp,version=1.0.0-61--66-6
(filter pods by its label).
Upvotes: 1