Techiescorner
Techiescorner

Reputation: 811

kubernetes init container for spark-submit

I am trying to run a spark-submit to the Kubernetes cluster with spark 3.2.1 image and it is working. Now my question is, can I execute an init container along with the spark-submit? What am trying to achieve is that the init container check another service is up or not, is it up then spark-submit will run or it fail.

I can see that a conf parameter "spark.kubernetes.initContainer.image" for spark version 2.3 but not for 3.2.1 (https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html)

is there any mechanism that I can use to check other services are up or not before I submit a spark job?

I can see init container usage for the spark in the below links but it is not providing an accurate answer

https://docs.bitnami.com/kubernetes/infrastructure/spark/configuration/configure-sidecar-init-containers/ https://doc.lucidworks.com/spark-guide/11153/running-spark-on-kubernetes

any help will be much appreciated, thanks.

Upvotes: 1

Views: 1523

Answers (3)

beatrice
beatrice

Reputation: 4391

You can define a pod template for your pod

 ./bin/spark-submit --master k8s://50.1.0.4:6443 --deploy-mode cluster --name spark-pi --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:v3.2.1 
--conf spark.kubernetes.driver.podTemplateFile=//path/my_pod_template.yaml
--conf spark.kubernetes.executor.podTemplateFile=//path/my_pod_template.yaml
--conf local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar 

Note, that a template doesn't have to contain all necessary fields for Spark app to function. It's main purpose, as described in the official docs is to:

Spark users can similarly use template files to define the driver or executor pod configurations that Spark configurations do not support.

That means that a lot/most fields will be overridden based on --conf values. In my case I didn't want to specify the main container spec, I only needed the initContainer to make some init checks. Needless to say, all volumes and env vars which are available to the main container will also be available to the init container without explicitly adding them to the pod template.

my_pod_template.yaml:
something like in Alan's answer

spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]

source: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template

Upvotes: 1

Techiescorner
Techiescorner

Reputation: 811

I found that the best way to submit a spark job is the sparkoperator, more details can be found in the GitHub link

There is one option to include an init container and a sidecar container.

Upvotes: 1

Alan
Alan

Reputation: 819

You don't mention if the other service is in the same container or not but the principles are the same. It's covered in the docs here and gives this example which defines a simple Pod that has two init containers. The first waits for myservice, and the second waits for mydb. Once both init containers complete, the Pod runs the app container from its spec section.

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]

Upvotes: 0

Related Questions