Reputation: 835
I just upgraded to Spark 2.0 from 1.4 and downloaded the ec2 directory from github.com/amplab/spark-ec2/tree/branch-2.0
To spin up some clusters I go to my ec2 directory and run these commands:
./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>
./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>
I have my clusters up and I'm logged into master but I don't know how to launch a pyspark notebook. With Spark 1.4 I'll run the command
IPYTHON_OPTS="notebook --ip=0.0.0.0" /root/spark/bin/pyspark --executor-memory 4G --driver-memory 4G &
and I have my notebook up and running fine but with Spark 2.0 there is no bin/pyspark directory. Can anyone help with this?
Upvotes: 0
Views: 257
Reputation: 451
According to the source comments:
https://apache.googlesource.com/spark/+/master/bin/pyspark
In Spark 2.0, IPYTHON and IPYTHON_OPTS are removed and pyspark fails to launch if either option is set in the user's environment. Instead, users should set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython and executor Python executables.
The following link will take you step by step. Along with upgrading to Spark 2.0, you should also upgrade to Juypter Notebooks (formerly Ipython Notebooks) as well.
Upvotes: 1