Reputation: 3770

Pyspark - different Python version in workers & driver

I'm working with pyspark in a python3 enviromet. I have a dataframe and I'm trying to split a column of dense vectos to multiple columns values. My df is this:

df_vector = kmeansModel_2.transform(finalData).select(['scalaredFeatures', 
                                                       'prediction'])
df_vector.show()

+--------------------+----------+
|    scalaredFeatures|prediction|
+--------------------+----------+
|[0.56785108466505...|         0|
|[1.41962771166263...|         0|
|[2.20042295307707...|         0|
|[0.14196277116626...|         0|
|[1.41962771166263...|         0|
+-------------------------------+

Well, in order to do my task I'm using the following code:

def extract(row):
    return (row.prediction, ) + tuple(row.scalaredFeatures.toArray().tolist())

df = df_vector.rdd.map(extract)toDF(["prediction"])

Unfortunately I get an error:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 52.0 failed 1 times, most recent failure: Lost task 
0.0 in stage 52.0 (TID 434, localhost, executor driver): 
org.apache.spark.api.python.PythonException: Traceback (most recent 
call last):
 File "pyspark/worker.py", line 123, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in 
driver 3.6, PySpark cannot run with different minor versions.Please 
check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON 
are correctly set.

Is there anybody whom can help me on this task? Thank!

Upvotes: 3

Answers (2)

Zendem

Reputation: 528

Check python version on terminal - python --version
Get root user privileges. On terminal type - sudo su
Write down the root user password
Execute this command to switch to python 3.6 - update-alternatives --install /usr/bin/python python /usr/bin/python3 1
Check python version - python --version
Done.

Upvotes: 0

buxizhizhoum

Reputation: 1929

If you use PyCharm, you could add PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to run/debug configurations.

Upvotes: 5

Pyspark - different Python version in workers &amp; driver

Answers (2)

Related Questions

Pyspark - different Python version in workers & driver