Pyspark yarn cluster submit error (Cannot run Program Python)

Question

I am trying to submit pyspark code with pandas udf (to use fbprophet...) it works well in local submit but gets error in cluster submit such as

Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 41, ip-172-31-11-94.ap-northeast-2.compute.internal, executor 2): java.io.IOException: Cannot run program
 "/mnt/yarn/usercache/hadoop/appcache/application_1620263926111_0229/container_1620263926111_0229_01_000001/environment/bin/python": error=2, No such file or directory

my spark-submit code:

PYSPARK_PYTHON=./environment/bin/python \
spark-submit \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python     \
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python     \
--jars jars/org.elasticsearch_elasticsearch-spark-20_2.11-7.10.2.jar \
--py-files dependencies.zip   \
--archives ./environment.tar.gz#environment \
--files config.ini \
$1

I made environment.tar.gz by conda pack, dependencies.zip as my local packages and config.ini to load settings

Is there anyway to handle this problem?

Pyspark yarn cluster submit error (Cannot run Program Python)

Answers (1)

Related Questions