Reputation: 2734
I am aware of Change Apache Livy's Python Version and How do i setup Pyspark in Python 3 with spark-env.sh.template.
I also have seen the Livy documentation
However, none of that works. Livy keeps using Python 2.7 no matter what.
This is running Livy 0.6.0 on an EMR cluster.
I have changed the PYSPARK_PYTHON
environment variable to /usr/bin/python3
in the hadoop user, my user, the root, and ec2-user. Logging into the EMR master node via ssh
and running pyspark
starts python3 as expected. But, Livy keeps using python2.7.
I added export PYSPARK_PYTHON=/usr/bin/python3
to the /etc/spark/conf/spark-env.sh
file. Livy keeps using python2.7.
I added "spark.yarn.appMasterEnv.PYSPARK_PYTHON":"/usr/bin/python3"
and "spark.executorEnv.PYSPARK_PYTHON":"/usr/bin/python3"
to the items listed below and in every case . Livy keeps using python2.7.
config.json
and config_other_settings.json
files before starting a PySpark kernel Jupyter%manage_spark
Jupyter widget. Livy keeps using python2.7.%%spark config
cell-magic before the line-magic %spark add --session test --url http://X.X.X.X:8998 --auth None --language python
Note: This works without any issues in another EMR cluster running Livy 0.7.0 I have gone over all of the settings on the other cluster and cannot find what is different. I did not have to do any of this on the other cluster, Livy just used python3 by default.
How exactly do I get Livy to use python3 instead of python2?
Upvotes: 2
Views: 1548
Reputation: 2734
Finally just found an answer after posting.
I ran the following in a PySpark kernel Jupyter session cell before running any code to start the PySpark session on the remote EMR cluster via Livy.
%%configure -f
{ "conf":{
"spark.pyspark.python": "python3"
}
}
Simply adding "spark.pyspark.python": "python3"
to the .sparkmagic config.json
or config_other_settings.json
also worked.
Confusing that this does not match the official Livy documentation.
Upvotes: 1