Reputation: 4561
I followed this link to install Spark Standalone mode on a cluster by placing pre-built versions of spark on each node on the cluster and running ./sbin/start-master.sh
on Master and ./sbin/start-slave.sh <master-spark-URL>
on slave. How do I continue from there to setup a pyspark application, for example in ipython notebook to utilize the cluster?
Do I need to install ipython on my local machine(laptop)?
Upvotes: 3
Views: 1630
Reputation: 4499
To use ipython to run pyspark You'll need to set add the following environment variables in .bashrc
export PYSPARK_DRIVER_PYTHON=ipython2 # As pyspark only works with python2 and not python3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
This will cause ipython2 notebook
to be launched when you execute pyspark
from shell.
Note: I assume you already have ipython notebook
installed. If not the easiest method is to use Anaconda python.
Reference:
Upvotes: 2