Reputation: 14038
I have a container that keeps crashing in my k8s cluster for unknown reasons. The container's process is an nginx server. The container appears to be receiving a SIGQUIT signal.
# build environment
FROM node:16-alpine as build
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
COPY package.json ./
COPY package-lock.json ./
RUN npm ci --silent
RUN npm install [email protected] -g --silent
COPY . ./
RUN npm run build
# production environment
FROM nginx:stable-alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/11/11 06:40:37 [notice[] 1#1: using the "epoll" event method
2021/11/11 06:40:37 [notice[] 1#1: nginx/1.20.1
2021/11/11 06:40:37 [notice[] 1#1: built by gcc 10.2.1 20201203 (Alpine 10.2.1_pre1)
2021/11/11 06:40:37 [notice[] 1#1: OS: Linux 5.4.120+
2021/11/11 06:40:37 [notice[] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/11/11 06:40:37 [notice[] 1#1: start worker processes
2021/11/11 06:40:37 [notice[] 1#1: start worker process 32
2021/11/11 06:40:37 [notice[] 1#1: start worker process 33
10.15.128.65 - - [11/Nov/2021:06:40:41 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03>\x85O#\xCC\xB9\xA5j\xAB\x8D\xC1PpZ\x18$\xE5ah\xDF7\xB1\xFF\xAD\x22\x050\xC3.+\xB6+ \x0F}S)\xC9\x1F\x0BY\x15_\x10\xC6\xAAF\xAA\x9F\x9E_@dG\x01\xF5vzt\xB50&;\x1E\x15\x00&\xC0/\xC00\xC0+\xC0,\xCC\xA8\xCC\xA9\xC0\x13\xC0\x09\xC0\x14\xC0" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:44 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03\xD8['\xE75x'\xC3}+v\xC9\x83\x84\x96EKn\xC5\xB6}\xEE\xBE\xD9Gp\xE9\x1BX<n\xB2 \xD9n\xD1\xC5\xFC\xF2\x8D\x92\xAC\xC0\xA8mdF\x17B\xA3y9\xDD\x98b\x0E\x996\xB6\xA5\xAB\xEB\xD4\xDA" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:47 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03Fy\x03N\x0E\x11\x89k\x7F\xC5\x00\x90w}\xEB{\x7F\xB1=\xF0" 400 157 "-" "-" "-"
2021/11/11 06:40:47 [notice[] 1#1: signal 3 (SIGQUIT) received, shutting down
2021/11/11 06:40:47 [notice[] 32#32: gracefully shutting down
2021/11/11 06:40:47 [notice[] 32#32: exiting
2021/11/11 06:40:47 [notice[] 33#33: gracefully shutting down
2021/11/11 06:40:47 [notice[] 32#32: exit
2021/11/11 06:40:47 [notice[] 33#33: exiting
2021/11/11 06:40:47 [notice[] 33#33: exit
2021/11/11 06:40:47 [notice[] 1#1: signal 17 (SIGCHLD) received from 33
2021/11/11 06:40:47 [notice[] 1#1: worker process 33 exited with code 0
2021/11/11 06:40:47 [notice[] 1#1: signal 29 (SIGIO) received
2021/11/11 06:40:47 [notice[] 1#1: signal 17 (SIGCHLD) received from 32
2021/11/11 06:40:47 [notice[] 1#1: worker process 32 exited with code 0
2021/11/11 06:40:47 [notice[] 1#1: exit
apiVersion: v1
kind: Pod
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/pod: runtime/default
creationTimestamp: "2021-11-11T06:40:30Z"
generateName: sgb-web-master-fb9f995fb-
labels:
app: sgb-web-master
pod-template-hash: fb9f995fb
name: sgb-web-master-fb9f995fb-zwhgl
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: sgb-web-master-fb9f995fb
uid: 96ebf43d-e2e6-4632-a536-764bcab8daeb
resourceVersion: "66168456"
uid: ed80b0d0-6681-4c2a-8edd-16c8ef6bee86
spec:
containers:
- env:
- name: PORT
value: "80"
image: cflynnus/saigonbros-web:master-d70f3001d130bf986da236a08e1fded4b64e8097
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTPS
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: saigonbros-web
ports:
- containerPort: 80
name: sgb-web-port
protocol: TCP
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-rkwb2
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: gk3-autopilot-cluster-1-default-pool-43dd48b9-tf0n
preemptionPolicy: PreemptLowerPriority
priority: 0
readinessGates:
- conditionType: cloud.google.com/load-balancer-neg-ready
restartPolicy: Always
schedulerName: gke.io/optimize-utilization-scheduler
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-rkwb2
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: null
message: 'Pod is in NEG "Key{\"k8s1-301c19bd-default-sgb-web-master-80-48ae70f6\",
zone: \"asia-southeast1-a\"}". NEG is not attached to any BackendService with
health checking. Marking condition "cloud.google.com/load-balancer-neg-ready"
to True.'
reason: LoadBalancerNegWithoutHealthCheck
status: "True"
type: cloud.google.com/load-balancer-neg-ready
- lastProbeTime: null
lastTransitionTime: "2021-11-11T06:40:33Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-11-11T06:44:42Z"
message: 'containers with unready status: [saigonbros-web]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-11-11T06:44:42Z"
message: 'containers with unready status: [saigonbros-web]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-11-11T06:40:33Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://dfc32581c1edda1a221dc00cede918cfb93225e51e505ea7a9f935fc9ab893d5
image: docker.io/cflynnus/saigonbros-web:master-d70f3001d130bf986da236a08e1fded4b64e8097
imageID: docker.io/cflynnus/saigonbros-web@sha256:ff8d6d42511ed6520967007714dfbd46817fca06bb65ae984bc04a8b90346222
lastState:
terminated:
containerID: containerd://dfc32581c1edda1a221dc00cede918cfb93225e51e505ea7a9f935fc9ab893d5
exitCode: 0
finishedAt: "2021-11-11T06:44:41Z"
reason: Completed
startedAt: "2021-11-11T06:44:30Z"
name: saigonbros-web
ready: false
restartCount: 6
started: false
state:
waiting:
message: back-off 2m40s restarting failed container=saigonbros-web pod=sgb-web-master-fb9f995fb-zwhgl_default(ed80b0d0-6681-4c2a-8edd-16c8ef6bee86)
reason: CrashLoopBackOff
hostIP: 10.148.15.200
phase: Running
podIP: 10.15.128.103
podIPs:
- ip: 10.15.128.103
qosClass: Guaranteed
startTime: "2021-11-11T06:40:33Z"
Upvotes: 3
Views: 7827
Reputation: 7845
Your liveness probe is configured as HTTPS
on port 80. Just change it to HTTP
. Look at the key spec.containers.livenessProbe.httpGet.scheme
.
Kubernetes thinks that your pod isn't alive (bad liveness probe) and cause the SIGQUIT.
Normally this will help you. When your pod isn't alive, then Kubernetes tries to restart the app for you.
You can also identify that behavior in the logs of your nginx:
10.15.128.65 - - [11/Nov/2021:06:40:41 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03>\x85O#\xCC\xB9\xA5j\xAB\x8D\xC1PpZ\x18$\xE5ah\xDF7\xB1\xFF\xAD\x22\x050\xC3.+\xB6+ \x0F}S)\xC9\x1F\x0BY\x15_\x10\xC6\xAAF\xAA\x9F\x9E_@dG\x01\xF5vzt\xB50&;\x1E\x15\x00&\xC0/\xC00\xC0+\xC0,\xCC\xA8\xCC\xA9\xC0\x13\xC0\x09\xC0\x14\xC0" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:44 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03\xD8['\xE75x'\xC3}+v\xC9\x83\x84\x96EKn\xC5\xB6}\xEE\xBE\xD9Gp\xE9\x1BX<n\xB2 \xD9n\xD1\xC5\xFC\xF2\x8D\x92\xAC\xC0\xA8mdF\x17B\xA3y9\xDD\x98b\x0E\x996\xB6\xA5\xAB\xEB\xD4\xDA" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:47 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03Fy\x03N\x0E\x11\x89k\x7F\xC5\x00\x90w}\xEB{\x7F\xB1=\xF0" 400 157 "-" "-" "-"
2021/11/11 06:40:47 [notice[] 1#1: signal 3 (SIGQUIT) received, shutting down
There are the three configured liveness probes with a period of three seconds. They are unreadable, because kubernetes send TLS packets (which are in a plain-view not human readable).
Immediately after that, there is the shutdown.
The other way is to read the description of your pod. There you can see, that HTTPS
and port 80 are configured. HTTPS
runs over port 443, so it must be a configuration error.
Upvotes: 5