jakko
jakko

Reputation: 835

Launch pyspark Ipython notebook on ec2

I just upgraded to Spark 2.0 from 1.4 and downloaded the ec2 directory from github.com/amplab/spark-ec2/tree/branch-2.0

To spin up some clusters I go to my ec2 directory and run these commands:

./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>

./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>

I have my clusters up and I'm logged into master but I don't know how to launch a pyspark notebook. With Spark 1.4 I'll run the command

IPYTHON_OPTS="notebook --ip=0.0.0.0" /root/spark/bin/pyspark --executor-memory 4G --driver-memory 4G &

and I have my notebook up and running fine but with Spark 2.0 there is no bin/pyspark directory. Can anyone help with this?

Upvotes: 0

Views: 257

Answers (1)

user7351608
user7351608

Reputation: 451

According to the source comments:

https://apache.googlesource.com/spark/+/master/bin/pyspark

In Spark 2.0, IPYTHON and IPYTHON_OPTS are removed and pyspark fails to launch if either option is set in the user's environment. Instead, users should set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython and executor Python executables.

The following link will take you step by step. Along with upgrading to Spark 2.0, you should also upgrade to Juypter Notebooks (formerly Ipython Notebooks) as well.

Upvotes: 1

Related Questions