Reputation: 291
I have the following cronjob which deletes pods in a specific namespace.
I run the job as-is but it seems that the job doesn't run for each 20 min, it runs every few (2-3) min, what I need is that on each 20 min the job will start deleting the pods in the specified namespace and then terminate, any idea what could be wrong here?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: sa
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name | while read -r POD; do kubectl delete "$POD"; sleep 30; done
I'm really not sure why this happens...
Maybe the delete of the pod collapse
update
I tried the following but no pods were deleted,any idea?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: pod-exterminator
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name --selector name!=restart | while read -r POD; do kubectl delete "$POD"; sleep 10; done.
Upvotes: 1
Views: 1754
Reputation: 20547
This cronjob pod will delete itself at some point during the execution. Causing the job to fail and additionally resetting its back-off count.
The docs say:
The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
You need to apply an appropriate filter. Also note that you can delete all pods with a single command.
Add a label to spec.jobTemplate.spec.template.metadata
that you can use for filtering.
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
jobTemplate:
spec:
template:
metadata:
labels:
name: restart # label the pod
Then use this label to delete all pods that are not the cronjob pod.
kubectl delete pod --selector name!=restart
Since you state in the comments, you need a loop, a full working example may look like this.
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
namespace: sandbox
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: restart
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- |
kubectl get pods -o name --selector "name!=restart" |
while read -r POD; do
kubectl delete "$POD"
sleep 30
done
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: restart
namespace: sandbox
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-management
namespace: sandbox
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: restart-pod-management
namespace: sandbox
subjects:
- kind: ServiceAccount
name: restart
namespace: sandbox
roleRef:
kind: Role
name: pod-management
apiGroup: rbac.authorization.k8s.io
kubectl create namespace sandbox
kubectl config set-context --current --namespace sandbox
kubectl run pod1 --image busybox -- sleep infinity
kubectl run pod2 --image busybox -- sleep infinity
kubectl apply -f restart.yaml # the above file
Here you can see how the first pod is getting terminated.
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/pod1 1/1 Terminating 0 43s
pod/pod2 1/1 Running 0 39s
pod/restart-27432801-rrtvm 1/1 Running 0 16s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/restart */1 * * * * False 1 17s 36s
NAME COMPLETIONS DURATION AGE
job.batch/restart-27432801 0/1 17s 17s
Note that this is actually slightly buggy. Because from the time you're reading the pod list to the time you delete an individual pod in the list, the pod may not exist any more. You could use the below to ignore those cases, since when they are gone you don't need to delete them.
kubectl delete "$POD" || true
That said, since you name your job restart, I assume the purpose of this is to restart the pods of some deployments. You could actually use a proper restart, leveraging Kubernetes update strategies.
kubectl rollout restart $(kubectl get deploy -o name)
With the default update strategy, this will lead to new pods being created first and making sure they are ready before terminating the old ones.
$ kubectl rollout restart $(kubectl get deploy -o name)
NAME READY STATUS RESTARTS AGE
pod/app1-56f87fc665-mf9th 0/1 ContainerCreating 0 2s
pod/app1-5cbc776547-fh96w 1/1 Running 0 2m9s
pod/app2-7b9779f767-48kpd 0/1 ContainerCreating 0 2s
pod/app2-8d6454757-xj4zc 1/1 Running 0 2m9s
This also works with deamonsets.
$ kubectl rollout restart -h
Restart a resource.
Resource rollout will be restarted.
Examples:
# Restart a deployment
kubectl rollout restart deployment/nginx
# Restart a daemon set
kubectl rollout restart daemonset/abc
Upvotes: 5