Jakub Vlček
Jakub Vlček

Reputation: 35

Python dependencies in kubeflow spark operator

I wanted to ask if there is a way to use python as a .wheel or .egg or just .py dependency in kubeflow spark operator.

The resulting file i have in mind would look something like this, the dependecy would be either under jars or files, i presume files would make more sense:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-python
  namespace: default
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: spark:3.5.3
  imagePullPolicy: IfNotPresent
  mainApplicationFile: local:///path/to/my/python/script.py
  deps:
    jars:
      - local:///path/to/python/functions.py
    files:
      - gs://path/to/python/functions.py
  sparkVersion: 3.5.3
  driver:
    cores: 1
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    instances: 1
    cores: 1
    memory: 512m

Upvotes: 0

Views: 63

Answers (1)

Jakub Vlček
Jakub Vlček

Reputation: 35

It is possible to use python files as dependencies, see link. This has worked for me:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: view-creator-test
  namespace: default
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: spark:3.5.3
  imagePullPolicy: IfNotPresent
  mainApplicationFile: local:///path/to/my/python/script.py
  arguments: []
  sparkVersion: 3.5.3
  deps:
    pyFiles:
      - local:///mnt/spark/dependency_1.py
      - local:///mnt/spark/dependency_2.py
  driver:
    labels:
      version: 3.5.3
    cores: 1
    memory: 512m
    volumeMounts:
      - name: view-creator-volume
        mountPath: /mnt/spark
  executor:
    labels:
      version: 3.5.3
    instances: 1
    cores: 1
    memory: 512m
    volumeMounts:
      - name: view-creator-volume
        mountPath: /mnt/spark
  volumes:
    - name: view-creator-volume
      persistentVolumeClaim:
        claimName: view-creator-pvc

Upvotes: 1

Related Questions