Reputation: 239
I have a PySpark job present locally on my laptop. If I want to submit it on my minikube cluster using spark-submit, any idea how to pass the python file ?
I'm using following command, but it isn't working
./spark-submit \
--master k8s://https://192.168.64.6:8443 \
--deploy-mode cluster \
--name amazon-data-review \
--conf spark.kubernetes.namespace=jupyter \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=500m \
--conf spark.kubernetes.container.image=prateek/spark-ubuntu-2.4.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image.pullSecrets=dockerlogin \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=s3a://prateek/spark-hs/ \
--conf spark.hadoop.fs.s3a.access.key=xxxxx \
--conf spark.hadoop.fs.s3a.secret.key=xxxxx \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.fast.upload=true \
/Users/prateek/apache-spark/amazon_data_review.py
Getting following error -
python3: can't open file '/Users/prateek/apache-spark/amazon_data_review.py': [Errno 2] No such file or directory
Is it required to keep the file within the Docker image itself. Can't we run it locally by keeping it on laptop
Upvotes: 2
Views: 1657
Reputation: 991
Spark on Kubernetes doesn't support submitting locally stored files with spark-submit
.
What you could do to make it work in cluster mode is to build Spark Docker image based on prateek/spark-ubuntu-2.4.5
with amazon_data_review.py
put inside of it (eg using Docker COPY /Users/prateek/apache-spark/amazon_data_review.py /amazon_data_review.py
statement).
Then just refer to it in the spark-submit
command using local://
file system, eg.:
spark-submit \
--master ... \
--conf ... \
...
local:///amazon_data_review.py
The alternative is to store that file on http(s)://
or hdfs://
-like accessible location.
Upvotes: 4
Reputation: 239
It's solved. Running it with client mode helped to run it
--deploy-mode client
Upvotes: -1