David Fernandez
David Fernandez

Reputation: 585

Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:

The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.

Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?

Update

This just happened in our prod environment. The result of describing the offending pod is:

api: Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31 Image: 1.0.5-RC1 Image ID: docker://sha256:XXXX Ports: 9080/TCP, 9443/TCP State: Running Started: Mon, 27 Mar 2017 12:30:05 +0100 Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 24 Mar 2017 13:32:14 +0000 Finished: Mon, 27 Mar 2017 12:29:58 +0100 Ready: True Restart Count: 1

Update 2

Here it is the deployment.yaml file used:

apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
  namespace: "${ENV}"
  name: "${APP}${CANARY}"
  labels:
    component: "${APP}${CANARY}"
spec:
  replicas: ${PODS}
  minReadySeconds: 30
  revisionHistoryLimit: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        component: "${APP}${CANARY}"
    spec:
      serviceAccount: "${APP}"

${IMAGE_PULL_SECRETS}

      containers:
      - name: "${APP}${CANARY}"
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
        env:
        - name: "KUBERNETES_CA_CERTIFICATE_FILE"
          value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: "metadata.namespace"
        - name: "ENV"
          value: "${ENV}"
        - name: "PORT"
          value: "${INTERNAL_PORT}"
        - name: "CACHE_POLICY"
          value: "all"
        - name: "SERVICE_ORIGIN"
          value: "${SERVICE_ORIGIN}"
        - name: "DEBUG"
          value: "http,controllers:recommend"
        - name: "APPDYNAMICS"
          value: "true"
        - name: "VERSION"
          value: "${TAG}"
        ports:
        - name: "http"
          containerPort: ${HTTP_INTERNAL_PORT}
          protocol: "TCP"
        - name: "https"
          containerPort: ${HTTPS_INTERNAL_PORT}
          protocol: "TCP"

The Dockerfile of the image referenced in the above Deployment manifest:

FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH      $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app

RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app

# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
    && /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'

ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH      $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH

# Create app directory
WORKDIR /home/app
COPY . /home/app

# Install app dependencies
RUN npm install

EXPOSE 9080 9443
CMD [ "npm", "start" ]

npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.

Upvotes: 11

Views: 24755

Answers (3)

Festus Fashola
Festus Fashola

Reputation: 161

I hade similar problem when we upgraded to version 2.x Pos get restarted even after the Dags ran successfully.

I later resolved it after a long time of debugging by overriding the pod template and specifying it in the airflow.cfg file.

[kubernetes]
....
pod_template_file = {{ .Values.airflow.home }}/pod_template.yaml



---
# pod_template.yaml
apiVersion: v1
kind: Pod
metadata:
  name: dummy-name
spec:
  serviceAccountName: default
  restartPolicy: Never
  containers:
    - name: base
      image: dummy_image
      imagePullPolicy: IfNotPresent
      ports: []
      command: []

Upvotes: 0

Attila Molnár
Attila Molnár

Reputation: 144

If this is still relevant, we just had a similar problem in our cluster.

We have managed to find more information by inspecting the logs from docker itself. ssh onto your k8s node and run the following:

sudo journalctl -fu docker.service

Upvotes: 1

Yu-Ju Hong
Yu-Ju Hong

Reputation: 7287

Check the version of docker you run, and whether the docker daemon was restarted during that time.

If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.

Upvotes: 2

Related Questions