chmod error while writing outputs with Spark on Kubernetes

Question

I'm working on a POC for getting a Spark cluster set up to use Kubernetes for resource management using AKS (Azure Kubernetes Service). I'm using spark-submit to submit pyspark applications to k8s in cluster mode and I've been successful in getting applications to run fine.

I got Azure file share set up to store application scripts and Persistent Volume and a Persistent Volume Claim pointing to this file share to allow Spark to access the scripts from Kubernetes. This works fine for applications that don't write any output, like the pi.py example given in the spark source code, but writing any kind of outputs fails in this setup. I tried running a script to get wordcounts and the line

wordCounts.saveAsTextFile(f"./output/counts")

causes an exception where wordCounts is an rdd.

Traceback (most recent call last):
File "/opt/spark/work-dir/wordcount2.py", line 14, in 
wordCounts.saveAsTextFile(f"./output/counts")
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1570, in saveAsTextFile
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o65.saveAsTextFile.
: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted

The directory "counts" has been created from the spark application just fine, so it seems like it has required permissions, but this subsequent chmod that spark tries to perform internally fails. I haven't been able to figure out the cause and what exact configuration I'm missing in my commands that's causing this. Any help would be greatly appreciated.

The kubectl version I'm using is

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"881d4a5a3c0f4036c714cfb601b377c4c72de543", GitTreeState:"clean", BuildDate:"2021-10-21T05:13:01Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

The spark version is 2.4.5 and the command I'm using is

/bin/spark-submit --master k8s://:443  \
--deploy-mode cluster  \
--name spark-pi3 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14  \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--verbose /opt/spark/work-dir/wordcount2.py

The PV and PVC are pretty basic. The PV yml is:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-fileshare-pv
  labels:
    usage: azure-fileshare-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  azureFile:
    secretName: azure-storage-secret
    shareName: dssparktestfs
    readOnly: false
    secretNamespace: spark-operator

The PVC yml is:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: azure-fileshare-pvc
  # Set this annotation to NOT let Kubernetes automatically create
  # a persistent volume for this volume claim.
  annotations:
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  selector:
    # To make sure we match the claim with the exact volume, match the label
    matchLabels:
      usage: azure-fileshare-pv

Let me know if more info is needed.

chmod error while writing outputs with Spark on Kubernetes

Answers (1)

Related Questions