Reputation: 21
I have installed Anaconda(python 2.7 version) in my machine and started the jupyter notebook with "PYSPARK_DRIVER_PYTHON=jupyter" and PYSPARK_DRIVER_PYTHON_OPTS="notebook" by this i' am connecting to jupyter notebook but unable to run "print" command also. when i' am running the command it is going to the next line but not showing the output and print is not highlighted in color.
Already I have installed pyspark and running in command prompt in my windows machine(standalone mode) and its working fine.....but i need to run in jupyter notebook(windows). Can anybody help me??
Upvotes: 1
Views: 1277
Reputation: 2708
For latest setup, view their official jupyter docker repo.
It's Jupyter 4.x with Spark 2.1.0, Hadoop 2.7
docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
Jupyter Notebook Python, Spark, Mesos Stack
As side notes,
1 jupyter use config file, whereas, ipython use profile. I believe if you want to run spark standalone locally. Take a look at Dockerfile and figure out what's the magic.
2 more gold in https://github.com/jupyter/docker-stacks
Upvotes: 0
Reputation: 2600
This is all what you need to do to setup Pyspark with Jupyter on windows when you already have pyspark shell correctly setup as you mentioned.
Add 2 new environment variables, set
PYSPARK_DRIVER_PYTHON
to jupyterPYSPARK_DRIVER_PYTHON_OPTS
to notebookRun pyspark
from CMD prompt and not 'jupyter notebook
'
This should solve the problem.
Upvotes: 1