Reputation: 1334
I am deploying pyspark in my aks Kubernetes cluster using this guides:
I have deployed my driver pod as is explained in the links above:
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: spark
name: my-notebook-deployment
labels:
app: my-notebook
spec:
replicas: 1
selector:
matchLabels:
app: my-notebook
template:
metadata:
labels:
app: my-notebook
spec:
serviceAccountName: spark
containers:
- name: my-notebook
image: pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest
ports:
- containerPort: 8888
volumeMounts:
- mountPath: /root/data
name: my-notebook-pv
workingDir: /root
resources:
limits:
memory: 2Gi
volumes:
- name: my-notebook-pv
persistentVolumeClaim:
claimName: my-notebook-pvc
---
apiVersion: v1
kind: Service
metadata:
namespace: spark
name: my-notebook-deployment
spec:
selector:
app: my-notebook
ports:
- protocol: TCP
port: 29413
clusterIP: None
Then I can create the spark cluster using the following code:
import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
# Create Spark config for our Kubernetes based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", "<MYIMAGE>")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "7")
sparkConf.set("spark.executor.cores", "2")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.driver.port", "29413")
sparkConf.set("spark.driver.host", "my-notebook-deployment.spark.svc.cluster.local")
# Initialize our Spark cluster, this will actually
# generate the worker nodes.
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext
It works.
How can I create an external pod that can execute a python script that lives in my my-notebook-deployment, I can do it in my terminal:
kubectl exec my-notebook-deployment-7669bb6fc-29stw python3 myscript.py
But I would want to be able to automate it executing this command inside another pod
Upvotes: 2
Views: 8889
Reputation: 3205
You can launch a second pod based on the pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest container image inside a k8s job: https://kubernetes.io/docs/concepts/workloads/controllers/job/#running-an-example-job If your script requires access to resources available inside the first pod, then you have to use the service my-notebook-deployment or the volume my-notebook-pv to access them from the second pod. Sharing rw volume between pods requires pods to run on the same node. Note that k8s also proposes Cronjob: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Upvotes: 1
Reputation: 786
In general you can spin up new pod with specified command running in it i.e.:
kubectl run mypod --image=python3 --command -- <cmd> <arg1> ... <argN>
In your case you would need to provide the code of the myscript.py to the pod (i.e.: by mounting a ConfigMap with the script content) or build a new container image based on the python docker and adding the script to the image.
Upvotes: 2