Moon.Hou
Moon.Hou

Reputation: 45

How to assign the python interpreter spark worker used?

How to assign the python interpreter spark worker used?

i try several method like: 1) set env Vars

export PYSPARK_DRIVER_PYTHON=/python_path/bin/python
export PYSPARK_PYTHON=/python_path/bin/python

not work. i'm sure PYSPARK_DRIVER_PYTHON PYSPARK_PYTHON env set success use:

env | grep PYSPARK_PYTHON

i want to pyspark use

 /python_path/bin/python

as the starting python interpreter

but worker start use the :

python -m deamon

i don't want to link default python to /python_path/bin/python in the fact that this may affect other devs, bcz default python and /python_path/bin/python is not same version, and both in production use.

Also set spark-env.sh not works:

spark.pyspark.driver.python=/python_path/bin/python spark.pyspark.python=/python_path/bin/python

when start driver some warning logs like:

conf/spark-env.sh: line 63: spark.pyspark.driver.python=/python_path/bin/python: No such file or directory conf/spark-env.sh: line 64: spark.pyspark.python=/python_path/bin/python: No such file or directory

Upvotes: 1

Views: 2895

Answers (1)

Yehor Krivokon
Yehor Krivokon

Reputation: 877

1) Check permissions on your python directory. Maybe Spark doesn't have correct permissions. Try to do: sudo chmod -R 777 /python_path/bin/python

2) Spark documentation says:

Property spark.pyspark.python take precedence if it is set.

So try also set spark.pyspark.python in conf/spark-defaults.conf.

3) Also if you use cluster with more then one node you need to check if Python is installed in a correct directory on each node because you don't know where workers will be started.

4) Spark will use the first Python interpreter available on your system PATH, so like workaround you can set the path to your python in PYTHON variable.

Upvotes: 0

Related Questions