Reputation: 115
Currently we are using EMR version 5.23.0 to submit our pyspark jobs. We want to upgrade python version to 3.7 and wanted to check if python 3.7 is supported on current EMR version 5.23.0. There is no official information available on AWS about version compatibility. Can anybody please help me to find out this information
Upvotes: 2
Views: 4287
Reputation: 1459
From experiment, Python 3.7 is supported on EMR in version 5.30.x and version 6.0 with PySpark associated with it through the config:
'spark.pyspark.python', 'python3'
Upvotes: 1
Reputation: 1411
Currently EMR comes with python v3.6.x
But I'll suggest not to replace python v3.6.x. install miniconda during bootstrap. Miniconda can give you freedom to choose python version.
Install libraries using conda install <library-name>
.
But do not install pyspark
, it will be already there with all the configuration. Installing pyspark separately will cause issue with configuration.
You can also create your own AMI with all pre-installed. This will reduce your bootstrap time.
For pyspark shell: Add these 2 environment variable
export PYSPARK_DRIVER_PYTHON=/<path-miniconda-home>/bin/python
export PYSPARK_PYTHON=/<path-miniconda-home>/bin/python
For spark-submit: Add these configuration
--conf spark.executorEnv.PYSPARK_DRIVER_PYTHON=/<path-miniconda-home>/bin/python
--conf spark.executorEnv.PYSPARK_PYTHON=/<path-miniconda-home>/bin/python
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/<path-miniconda-home>/bin/python
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/<path-miniconda-home>/bin/python
Upvotes: 1
Reputation: 35258
Looks like it comes with 3.6 as of version 5.20.
You could of course modify this, but it won't be guaranteed.
Upvotes: 0