Shivaji Mutkule
Shivaji Mutkule

Reputation: 1258

Spark on Kubernetes driver pod cleanup

I am running spark 3.1.1 on kubernetes 1.19. Once job finishes executor pods get cleaned up but driver pod remains in completed state. How to clean up driver pod once it is completed? any configuration option to set?

NAME                                           READY   STATUS      RESTARTS   AGE
my-job-0e85ea790d5c9f8d-driver                 0/1     Completed   0          2d20h
my-job-8c1d4f79128ccb50-driver                 0/1     Completed   0          43h
my-job-c87bfb7912969cc5-driver                 0/1     Completed   0          43h

Upvotes: 5

Views: 5311

Answers (4)

user2566717
user2566717

Reputation: 51

Kubernetes does have a Pod lifecycle and runs a PodGC, where it will clean up any pods in the "Failed" or "Succeeded" phase. Looks like it will only do this when it hits a threshold determined by terminated-pod-gc-threshold.

Garbage collection of Pods 
For failed Pods, the API objects remain in the cluster's API until a human or controller process explicitly removes them.

The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up terminated Pods (with a phase of Succeeded or Failed), when the number of Pods exceeds the configured threshold (determined by terminated-pod-gc-threshold in the kube-controller-manager). This avoids a resource leak as Pods are created and terminated over time.

source: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection

Upvotes: 0

Tom Slayer
Tom Slayer

Reputation: 70

spark.kubernetes.driver.service.deleteOnTermination was added to spark in 3.2.0. This should solve the issue. src: https://spark.apache.org/docs/latest/core-migration-guide.html

update: this will only delete the service to the pod..but not the pod itself

Upvotes: 3

user14392764
user14392764

Reputation: 61

Concerning the initial question "Spark on Kubernetes driver pod cleanup", it seems that there is no way to pass, at spark-submit time, a TTL parameter to kubernetes for avoiding the never-removal of driver pods in completed status.

From Spark documentation: https://spark.apache.org/docs/latest/running-on-kubernetes.html When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.

It is not very clear who is doing this 'eventually garbage collected'.

Upvotes: 4

whites11
whites11

Reputation: 13260

According to the official documentation since Kubernetes 1.12:

Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job. When the TTL controller cleans up the Job, it will delete the Job cascadingly, i.e. delete its dependent objects, such as Pods, together with the Job. Note that when the Job is deleted, its lifecycle guarantees, such as finalizers, will be honored.

Example:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-ttl
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      ...

The Job pi-with-ttl will be eligible to be automatically deleted, 100 seconds after it finishes. If the field is set to 0, the Job will be eligible to be automatically deleted immediately after it finishes.

If customisation of the Job resource is not possible you may use an external tool to clean up completed jobs. For example check https://github.com/dtan4/k8s-job-cleaner

Upvotes: 2

Related Questions