Spark submit fails in k8s Cluster

I am trying to submit the Spark application to minikube k8s cluster (Spark Version used : 2.4.3) using below command:

spark-submit \
--master <K8S_MASTER> \
--deploy-mode cluster \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=<my docker image> \
--conf spark.kubernetes.driver.pod.name=spark-py-driver \
--conf spark.executor.memory=2g \
--conf spark.driver.memory=2g \
local:///home/proj/app/run.py <arguments>

Please note that the python script run.py exists in my docker image in the same path Once I do the Spark submit, the Spark job starts and the driver job gets killed. i could see only the below logs in the Driver pod

[FATAL tini (6)] exec driver-py failed: No such file or directory

I have verified the execution of pyspark job by doing a docker run on the docker image and is able to see that the above python code gets executed.

These are the events for the failed driver pod

Events:

  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    52m   default-scheduler  Successfully assigned ***-develop/run-py-1590847453453-driver to minikube
  Warning  FailedMount  52m   kubelet, minikube  MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "run-py-1590847453453-driver-conf-map" not found
  Normal   Pulled       52m   kubelet, minikube  Container image "******************:latest" already present on machine
  Normal   Created      52m   kubelet, minikube  Created container spark-kubernetes-driver
  Normal   Started      52m   kubelet, minikube  Started container spark-kubernetes-driver

Upvotes: 2

Views: 2005

Answers (2)

Tikchbila
Tikchbila

Reputation: 1

The spark submit used is not a 3.0.0 version. You also need to change the spark installation that uses the spark submit to version 3.0.0.

Upvotes: 0

I am using one of the base images from my org. But issue regarding the mount is only a Warning and the pod got successfully assigned after that.

FROM <project_repo>/<proj>/${SPARK_ALPINE_BUILD}
ENV SPARK_OPTS --driver-java-options=-Dlog4j.logLevel=info
ENV SPARK_MASTER "spark://spark-master:7077"

ADD https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar $SPARK_HOME/jars
ADD https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.11/2.3.2/spark-cassandra-connector_2.11-2.3.2.jar $SPARK_HOME/jars
USER root

# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

WORKDIR /home/<proj>/app
# copy files
COPY src/configs ./configs
COPY src/dependencies ./dependencies
COPY src/jobs ./jobs
COPY src/run.py ./run.py
COPY run.sh ./run.sh 
COPY src/requirements.txt . 

# install packages here 
RUN set -e; \
  pip install --no-cache-dir -r requirements.txt;

Upvotes: 1

Related Questions