Casey Flynn
Casey Flynn

Reputation: 14038

kubernetes, container crashing on SIGQUIT signal for unknown reasons

I have a container that keeps crashing in my k8s cluster for unknown reasons. The container's process is an nginx server. The container appears to be receiving a SIGQUIT signal.

Dockerfile
# build environment
FROM node:16-alpine as build
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
COPY package.json ./
COPY package-lock.json ./
RUN npm ci --silent
RUN npm install [email protected] -g --silent
COPY . ./
RUN npm run build

# production environment
FROM nginx:stable-alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
container logs
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/11/11 06:40:37 [notice[] 1#1: using the "epoll" event method
2021/11/11 06:40:37 [notice[] 1#1: nginx/1.20.1
2021/11/11 06:40:37 [notice[] 1#1: built by gcc 10.2.1 20201203 (Alpine 10.2.1_pre1) 
2021/11/11 06:40:37 [notice[] 1#1: OS: Linux 5.4.120+
2021/11/11 06:40:37 [notice[] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/11/11 06:40:37 [notice[] 1#1: start worker processes
2021/11/11 06:40:37 [notice[] 1#1: start worker process 32
2021/11/11 06:40:37 [notice[] 1#1: start worker process 33
10.15.128.65 - - [11/Nov/2021:06:40:41 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03>\x85O#\xCC\xB9\xA5j\xAB\x8D\xC1PpZ\x18$\xE5ah\xDF7\xB1\xFF\xAD\x22\x050\xC3.+\xB6+ \x0F}S)\xC9\x1F\x0BY\x15_\x10\xC6\xAAF\xAA\x9F\x9E_@dG\x01\xF5vzt\xB50&;\x1E\x15\x00&\xC0/\xC00\xC0+\xC0,\xCC\xA8\xCC\xA9\xC0\x13\xC0\x09\xC0\x14\xC0" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:44 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03\xD8['\xE75x'\xC3}+v\xC9\x83\x84\x96EKn\xC5\xB6}\xEE\xBE\xD9Gp\xE9\x1BX<n\xB2 \xD9n\xD1\xC5\xFC\xF2\x8D\x92\xAC\xC0\xA8mdF\x17B\xA3y9\xDD\x98b\x0E\x996\xB6\xA5\xAB\xEB\xD4\xDA" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:47 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03Fy\x03N\x0E\x11\x89k\x7F\xC5\x00\x90w}\xEB{\x7F\xB1=\xF0" 400 157 "-" "-" "-"
2021/11/11 06:40:47 [notice[] 1#1: signal 3 (SIGQUIT) received, shutting down
2021/11/11 06:40:47 [notice[] 32#32: gracefully shutting down
2021/11/11 06:40:47 [notice[] 32#32: exiting
2021/11/11 06:40:47 [notice[] 33#33: gracefully shutting down
2021/11/11 06:40:47 [notice[] 32#32: exit
2021/11/11 06:40:47 [notice[] 33#33: exiting
2021/11/11 06:40:47 [notice[] 33#33: exit
2021/11/11 06:40:47 [notice[] 1#1: signal 17 (SIGCHLD) received from 33
2021/11/11 06:40:47 [notice[] 1#1: worker process 33 exited with code 0
2021/11/11 06:40:47 [notice[] 1#1: signal 29 (SIGIO) received
2021/11/11 06:40:47 [notice[] 1#1: signal 17 (SIGCHLD) received from 32
2021/11/11 06:40:47 [notice[] 1#1: worker process 32 exited with code 0
2021/11/11 06:40:47 [notice[] 1#1: exit
kubectl get pod PODNAME --output=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2021-11-11T06:40:30Z"
  generateName: sgb-web-master-fb9f995fb-
  labels:
    app: sgb-web-master
    pod-template-hash: fb9f995fb
  name: sgb-web-master-fb9f995fb-zwhgl
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: sgb-web-master-fb9f995fb
    uid: 96ebf43d-e2e6-4632-a536-764bcab8daeb
  resourceVersion: "66168456"
  uid: ed80b0d0-6681-4c2a-8edd-16c8ef6bee86
spec:
  containers:
  - env:
    - name: PORT
      value: "80"
    image: cflynnus/saigonbros-web:master-d70f3001d130bf986da236a08e1fded4b64e8097
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 80
        scheme: HTTPS
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: saigonbros-web
    ports:
    - containerPort: 80
      name: sgb-web-port
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        ephemeral-storage: 1Gi
        memory: 2Gi
      requests:
        cpu: 500m
        ephemeral-storage: 1Gi
        memory: 2Gi
    securityContext:
      capabilities:
        drop:
        - NET_RAW
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rkwb2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: gk3-autopilot-cluster-1-default-pool-43dd48b9-tf0n
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  readinessGates:
  - conditionType: cloud.google.com/load-balancer-neg-ready
  restartPolicy: Always
  schedulerName: gke.io/optimize-utilization-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-rkwb2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'Pod is in NEG "Key{\"k8s1-301c19bd-default-sgb-web-master-80-48ae70f6\",
      zone: \"asia-southeast1-a\"}". NEG is not attached to any BackendService with
      health checking. Marking condition "cloud.google.com/load-balancer-neg-ready"
      to True.'
    reason: LoadBalancerNegWithoutHealthCheck
    status: "True"
    type: cloud.google.com/load-balancer-neg-ready
  - lastProbeTime: null
    lastTransitionTime: "2021-11-11T06:40:33Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-11-11T06:44:42Z"
    message: 'containers with unready status: [saigonbros-web]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-11-11T06:44:42Z"
    message: 'containers with unready status: [saigonbros-web]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-11-11T06:40:33Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://dfc32581c1edda1a221dc00cede918cfb93225e51e505ea7a9f935fc9ab893d5
    image: docker.io/cflynnus/saigonbros-web:master-d70f3001d130bf986da236a08e1fded4b64e8097
    imageID: docker.io/cflynnus/saigonbros-web@sha256:ff8d6d42511ed6520967007714dfbd46817fca06bb65ae984bc04a8b90346222
    lastState:
      terminated:
        containerID: containerd://dfc32581c1edda1a221dc00cede918cfb93225e51e505ea7a9f935fc9ab893d5
        exitCode: 0
        finishedAt: "2021-11-11T06:44:41Z"
        reason: Completed
        startedAt: "2021-11-11T06:44:30Z"
    name: saigonbros-web
    ready: false
    restartCount: 6
    started: false
    state:
      waiting:
        message: back-off 2m40s restarting failed container=saigonbros-web pod=sgb-web-master-fb9f995fb-zwhgl_default(ed80b0d0-6681-4c2a-8edd-16c8ef6bee86)
        reason: CrashLoopBackOff
  hostIP: 10.148.15.200
  phase: Running
  podIP: 10.15.128.103
  podIPs:
  - ip: 10.15.128.103
  qosClass: Guaranteed
  startTime: "2021-11-11T06:40:33Z"

Upvotes: 3

Views: 7827

Answers (1)

akop
akop

Reputation: 7845

Your liveness probe is configured as HTTPS on port 80. Just change it to HTTP. Look at the key spec.containers.livenessProbe.httpGet.scheme.

Kubernetes thinks that your pod isn't alive (bad liveness probe) and cause the SIGQUIT.
Normally this will help you. When your pod isn't alive, then Kubernetes tries to restart the app for you.


Edit

You can also identify that behavior in the logs of your nginx:

10.15.128.65 - - [11/Nov/2021:06:40:41 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03>\x85O#\xCC\xB9\xA5j\xAB\x8D\xC1PpZ\x18$\xE5ah\xDF7\xB1\xFF\xAD\x22\x050\xC3.+\xB6+ \x0F}S)\xC9\x1F\x0BY\x15_\x10\xC6\xAAF\xAA\x9F\x9E_@dG\x01\xF5vzt\xB50&;\x1E\x15\x00&\xC0/\xC00\xC0+\xC0,\xCC\xA8\xCC\xA9\xC0\x13\xC0\x09\xC0\x14\xC0" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:44 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03\xD8['\xE75x'\xC3}+v\xC9\x83\x84\x96EKn\xC5\xB6}\xEE\xBE\xD9Gp\xE9\x1BX<n\xB2 \xD9n\xD1\xC5\xFC\xF2\x8D\x92\xAC\xC0\xA8mdF\x17B\xA3y9\xDD\x98b\x0E\x996\xB6\xA5\xAB\xEB\xD4\xDA" 400 157 "-" "-" "-"
10.15.128.65 - - [11/Nov/2021:06:40:47 +0000] "\x16\x03\x01\x01\x00\x01\x00\x00\xFC\x03\x03Fy\x03N\x0E\x11\x89k\x7F\xC5\x00\x90w}\xEB{\x7F\xB1=\xF0" 400 157 "-" "-" "-"
2021/11/11 06:40:47 [notice[] 1#1: signal 3 (SIGQUIT) received, shutting down

There are the three configured liveness probes with a period of three seconds. They are unreadable, because kubernetes send TLS packets (which are in a plain-view not human readable).
Immediately after that, there is the shutdown.

The other way is to read the description of your pod. There you can see, that HTTPS and port 80 are configured. HTTPS runs over port 443, so it must be a configuration error.

Upvotes: 5

Related Questions