Patrick
Patrick

Reputation: 2709

Process on GKE finishes and restarts by itself

I've created a cluster on Google Kubernetes Engine :

gcloud container clusters create training-cluster     --num-nodes=1     --zone=us-central1-a      --machine-type="n1-highmem-2"     --scopes="gke-default,storage-rw"

I get the credentials:

gcloud container clusters get-credentials --zone us-central1-a training-cluster

and apply my yaml file:

kubectl apply -f pod.yaml

The yaml file looks like:

apiVersion: v1
kind: Pod
metadata:
  name: gke-training-pod
spec:
  containers:
  - name: my-custom-container
    image: gcr.io/xyz/object-classification:gpu
    args:
      {I put my container arguments here}

I can see in the logs that training starts and eventually reaches the end. Problem is that it restarts each time unless I delete my cluster. Is there an argument that I should add to avoid such a behavior?

Upvotes: 2

Views: 830

Answers (2)

CaioT
CaioT

Reputation: 2211

If you want to run the pod only once (stop the lifecycle as soon as the code/training is done), then you have to change the restartPolicy to Never or OnFailure in your pod yaml definition file.

containers:
  - name: my-custom-container
    image: gcr.io/xyz/object-classification:gpu
    args:
      {I put my container arguments here}
  restartPolicy: Never

Always means that the container will be restarted even if it exited with a zero exit code (i.e. successfully). This is useful when you don't care why the container exited, you just want to make sure that it is always running (e.g. a web server). This is the default.

OnFailure means that the container will only be restarted if it exited with a non-zero exit code (i.e. something went wrong). This is useful when you want accomplish a certain task with the pod, and ensure that it completes successfully - if it doesn't it will be restarted until it does.

Never means that the container will not be restarted regardless of why it exited.

Now, if you want to run the pod only once multiple times the best approach to go is with Kubernetes CronJobs/Jobs as mentioned by Harsh. That would be the best approach in this case.

Upvotes: 2

Harsh Manvar
Harsh Manvar

Reputation: 30083

If you are running the POD you set the restartPolicy: Never also

The spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.

The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.

OR else

You can use the cronjobs or jobs in Kubernetes which will come and go once the job or container ends the process.

https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

If you can use the cronjob you can add successfulJobsHistoryLimit: 0 so once your job finished it will remove that job and delete the pod from cluster also automatically :

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Upvotes: 2

Related Questions