J.C Guzman
J.C Guzman

Reputation: 1334

Execute a bash command in a pod that runs a python script in another script in Kubernetes

I am deploying pyspark in my aks Kubernetes cluster using this guides:

I have deployed my driver pod as is explained in the links above:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: spark
  name: my-notebook-deployment
  labels:
    app: my-notebook
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-notebook
  template:
    metadata:
      labels:
        app: my-notebook
    spec:
      serviceAccountName: spark
      containers:
      - name: my-notebook
        image: pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest
        ports:
          - containerPort: 8888
        volumeMounts:
          - mountPath: /root/data
            name: my-notebook-pv
        workingDir: /root
        resources:
          limits:
            memory: 2Gi
      volumes:
        - name: my-notebook-pv
          persistentVolumeClaim:
            claimName: my-notebook-pvc
---
apiVersion: v1
kind: Service
metadata:
  namespace: spark
  name: my-notebook-deployment
spec:
  selector:
    app: my-notebook
  ports:
    - protocol: TCP
      port: 29413
  clusterIP: None

Then I can create the spark cluster using the following code:

import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
# Create Spark config for our Kubernetes based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", "<MYIMAGE>")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "7")
sparkConf.set("spark.executor.cores", "2")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.driver.port", "29413")
sparkConf.set("spark.driver.host", "my-notebook-deployment.spark.svc.cluster.local")
# Initialize our Spark cluster, this will actually
# generate the worker nodes.
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext

It works.

How can I create an external pod that can execute a python script that lives in my my-notebook-deployment, I can do it in my terminal:

kubectl exec my-notebook-deployment-7669bb6fc-29stw python3 myscript.py

But I would want to be able to automate it executing this command inside another pod

Upvotes: 2

Views: 8889

Answers (2)

Fabrice Jammes
Fabrice Jammes

Reputation: 3205

You can launch a second pod based on the pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest container image inside a k8s job: https://kubernetes.io/docs/concepts/workloads/controllers/job/#running-an-example-job If your script requires access to resources available inside the first pod, then you have to use the service my-notebook-deployment or the volume my-notebook-pv to access them from the second pod. Sharing rw volume between pods requires pods to run on the same node. Note that k8s also proposes Cronjob: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

Upvotes: 1

pb100
pb100

Reputation: 786

In general you can spin up new pod with specified command running in it i.e.:

kubectl run mypod --image=python3 --command -- <cmd> <arg1> ... <argN>

In your case you would need to provide the code of the myscript.py to the pod (i.e.: by mounting a ConfigMap with the script content) or build a new container image based on the python docker and adding the script to the image.

Upvotes: 2

Related Questions