Reputation: 97
My Kubernetes cluster has memory pressure limits that I need to fix (at a later time).
There are sometimes anywhere from a few evicted pods to dozens. I created a Cronjob spec for clearing up the evicted pods. I tested the command inside and it works fine from powershell.
However, it doesn't matter if I specify a namespace in the spec or not, deploy it to every namespace that exists, the script doesn't seem to delete my evicted pods.
Original Script:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-evicted-pods
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command: ["sh", "-c", "kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -"]
restartPolicy: OnFailure
I tried creating the script with associated RBAC, with no luck either.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: development
name: cronjob-runner
rules:
- apiGroups:
- extensions
- apps
resources:
- deployments
verbs:
- 'patch'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cronjob-runner
namespace: development
subjects:
- kind: ServiceAccount
name: sa-cronjob-runner
namespace: development
roleRef:
kind: Role
name: cronjob-runner
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-cronjob-runner
namespace: development
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-all-failed-pods
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-cronjob-runner
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -
restartPolicy: OnFailure
I realize I should have better memory limits defined, but this functionality was working before I upgraded k8s to 1.16 from 1.14.
Is there something I'm doing wrong or missing? If it helps, I'm running in Azure (AKS).
Upvotes: 3
Views: 4788
Reputation: 458
Even you can use this command to delete all the device pods across namespaces
kubectl get pods --all-namespaces | grep Evicted | awk '{print $2 " --namespace=" $1}' | xargs kubectl delete pod --force
Upvotes: 0
Reputation: 44569
Your role need to change to a ClusterRole
because you are using --all-namespaces
in the kubectl command
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cronjob-runner
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
And the RoleBinding
that you have is for a service account sa-cronjob-runner
in development
namespace. But the cron you are running is actually is in default
namespace. Hence it's using the default
service account from default
namespace.
So either specify namespace development
in the cronjob and serviceAccountName: sa-cronjob-runner
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-evicted-pods
namespace: development
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-cronjob-runner
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command: ["sh", "-c", "kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -"]
restartPolicy: OnFailure
Or change the rolebinding to bind the ClusterRole to default
service account in default
namespace
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cronjob-runner
namespace: development
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: Role
name: cronjob-runner
apiGroup: rbac.authorization.k8s.io
Upvotes: 4
Reputation: 61551
It sounds like after the upgrade this:
kubectl get pods --all-namespaces --field-selector 'status.phase==Failed'`
is not picking up your failed pods anymore. It could be:
You can try running a debug pod to verify:
$ kubectl run -i --tty --rm debug --image=bitnami/kubectl:latest --restart=Never -- get pods --all-namespaces --field-selector 'status.phase==Failed'
Every Job in Kubernetes creates a Pod, so you can also look at the logs for your kubectl-runner
pods:
kubectl logs kubectl-runner-xxxxx
Update:
Based on the log files it looks like the default:default
service account doesn't have enough permissions this would fix it:
kubectl create clusterrolebinding myadmin-binding --clusterrole=cluster-admin --serviceaccount=default:default
But then if you'd like to be more restrictive you will have to create a more limited ClusterRole or Role (if you want it limited to a namespace)
Upvotes: 4