Reputation: 811
I am trying to run a spark-submit to the Kubernetes cluster with spark 3.2.1 image and it is working. Now my question is, can I execute an init container along with the spark-submit? What am trying to achieve is that the init container check another service is up or not, is it up then spark-submit will run or it fail.
I can see that a conf parameter "spark.kubernetes.initContainer.image" for spark version 2.3 but not for 3.2.1 (https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html)
is there any mechanism that I can use to check other services are up or not before I submit a spark job?
I can see init container usage for the spark in the below links but it is not providing an accurate answer
https://docs.bitnami.com/kubernetes/infrastructure/spark/configuration/configure-sidecar-init-containers/ https://doc.lucidworks.com/spark-guide/11153/running-spark-on-kubernetes
any help will be much appreciated, thanks.
Upvotes: 1
Views: 1523
Reputation: 4391
You can define a pod template for your pod
./bin/spark-submit --master k8s://50.1.0.4:6443 --deploy-mode cluster --name spark-pi --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:v3.2.1
--conf spark.kubernetes.driver.podTemplateFile=//path/my_pod_template.yaml
--conf spark.kubernetes.executor.podTemplateFile=//path/my_pod_template.yaml
--conf local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar
Note, that a template doesn't have to contain all necessary fields for Spark app to function. It's main purpose, as described in the official docs is to:
Spark users can similarly use template files to define the driver or executor pod configurations that Spark configurations do not support.
That means that a lot/most fields will be overridden based on --conf
values. In my case I didn't want to specify the main container spec, I only needed the initContainer
to make some init checks. Needless to say, all volumes and env vars which are available to the main container will also be available to the init container without explicitly adding them to the pod template.
my_pod_template.yaml:
something like in Alan's answer
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
source: https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template
Upvotes: 1
Reputation: 811
I found that the best way to submit a spark job is the sparkoperator, more details can be found in the GitHub link
There is one option to include an init container and a sidecar container.
Upvotes: 1
Reputation: 819
You don't mention if the other service is in the same container or not but the principles are the same. It's covered in the docs here and gives this example which defines a simple Pod that has two init containers. The first waits for myservice, and the second waits for mydb. Once both init containers complete, the Pod runs the app container from its spec section.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
Upvotes: 0