Jenney
Jenney

Reputation: 291

k8s cron job runs multi times

I have the following cronjob which deletes pods in a specific namespace.

I run the job as-is but it seems that the job doesn't run for each 20 min, it runs every few (2-3) min, what I need is that on each 20 min the job will start deleting the pods in the specified namespace and then terminate, any idea what could be wrong here?

apiVersion: batch/v1
kind: CronJob
metadata:
  name: restart
spec:
  schedule: "*/20 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          serviceAccountName: sa
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl:1.22.3
              command:
                - /bin/sh
                - -c
                - kubectl get pods -o name | while read -r POD; do kubectl delete "$POD"; sleep 30; done

I'm really not sure why this happens...

Maybe the delete of the pod collapse

update

I tried the following but no pods were deleted,any idea?

apiVersion: batch/v1
kind: CronJob
metadata:
  name: restart
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            name: restart
        spec:
          serviceAccountName: pod-exterminator
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl:1.22.3
              command:
                - /bin/sh
                - -c
                - kubectl get pods -o name --selector name!=restart | while read -r POD; do kubectl delete "$POD"; sleep 10; done.

Upvotes: 1

Views: 1754

Answers (1)

The Fool
The Fool

Reputation: 20547

This cronjob pod will delete itself at some point during the execution. Causing the job to fail and additionally resetting its back-off count.

The docs say:

The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.

You need to apply an appropriate filter. Also note that you can delete all pods with a single command.

Add a label to spec.jobTemplate.spec.template.metadata that you can use for filtering.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: restart
spec:
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            name: restart # label the pod

Then use this label to delete all pods that are not the cronjob pod.

kubectl delete pod --selector name!=restart

Since you state in the comments, you need a loop, a full working example may look like this.

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: restart
  namespace: sandbox
spec:
  schedule: "*/20 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            name: restart
        spec:
          serviceAccountName: restart
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl:1.22.3
              command:
                - /bin/sh
                - -c
                - |
                  kubectl get pods -o name --selector "name!=restart" |
                    while read -r POD; do
                      kubectl delete "$POD"
                      sleep 30
                    done
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: restart
  namespace: sandbox
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-management
  namespace: sandbox
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: restart-pod-management
  namespace: sandbox
subjects:
  - kind: ServiceAccount
    name: restart
    namespace: sandbox
roleRef:
  kind: Role
  name: pod-management
  apiGroup: rbac.authorization.k8s.io
kubectl create namespace sandbox
kubectl config set-context --current --namespace sandbox
kubectl run pod1 --image busybox -- sleep infinity
kubectl run pod2 --image busybox -- sleep infinity
kubectl apply -f restart.yaml # the above file

Here you can see how the first pod is getting terminated.

$ kubectl get all
NAME                         READY   STATUS        RESTARTS   AGE
pod/pod1                     1/1     Terminating   0          43s
pod/pod2                     1/1     Running       0          39s
pod/restart-27432801-rrtvm   1/1     Running       0          16s

NAME                    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/restart   */1 * * * *   False     1        17s             36s

NAME                         COMPLETIONS   DURATION   AGE
job.batch/restart-27432801   0/1           17s        17s

Note that this is actually slightly buggy. Because from the time you're reading the pod list to the time you delete an individual pod in the list, the pod may not exist any more. You could use the below to ignore those cases, since when they are gone you don't need to delete them.

kubectl delete "$POD" || true

That said, since you name your job restart, I assume the purpose of this is to restart the pods of some deployments. You could actually use a proper restart, leveraging Kubernetes update strategies.

kubectl rollout restart $(kubectl get deploy -o name)

With the default update strategy, this will lead to new pods being created first and making sure they are ready before terminating the old ones.

$ kubectl rollout restart $(kubectl get deploy -o name)
NAME                        READY   STATUS              RESTARTS   AGE
pod/app1-56f87fc665-mf9th   0/1     ContainerCreating   0          2s
pod/app1-5cbc776547-fh96w   1/1     Running             0          2m9s
pod/app2-7b9779f767-48kpd   0/1     ContainerCreating   0          2s
pod/app2-8d6454757-xj4zc    1/1     Running             0          2m9s

This also works with deamonsets.

$ kubectl rollout restart -h
Restart a resource.

     Resource rollout will be restarted.

Examples:
  # Restart a deployment
  kubectl rollout restart deployment/nginx
  
  # Restart a daemon set
  kubectl rollout restart daemonset/abc

Upvotes: 5

Related Questions