Reputation: 21

Pyspark integration with Jupyter

I have installed Anaconda(python 2.7 version) in my machine and started the jupyter notebook with "PYSPARK_DRIVER_PYTHON=jupyter" and PYSPARK_DRIVER_PYTHON_OPTS="notebook" by this i' am connecting to jupyter notebook but unable to run "print" command also. when i' am running the command it is going to the next line but not showing the output and print is not highlighted in color.

Already I have installed pyspark and running in command prompt in my windows machine(standalone mode) and its working fine.....but i need to run in jupyter notebook(windows). Can anybody help me??

Upvotes: 1

Answers (2)

CodeFarmer

Reputation: 2708

For latest setup, view their official jupyter docker repo.

It's Jupyter 4.x with Spark 2.1.0, Hadoop 2.7

docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook

Jupyter Notebook Python, Spark, Mesos Stack

As side notes,

1 jupyter use config file, whereas, ipython use profile. I believe if you want to run spark standalone locally. Take a look at Dockerfile and figure out what's the magic.

2 more gold in https://github.com/jupyter/docker-stacks

Upvotes: 0

Rahul

Reputation: 2600

This is all what you need to do to setup Pyspark with Jupyter on windows when you already have pyspark shell correctly setup as you mentioned.

Add 2 new environment variables, set
- PYSPARK_DRIVER_PYTHON to jupyter
- PYSPARK_DRIVER_PYTHON_OPTS to notebook
Run pyspark from CMD prompt and not 'jupyter notebook'

This should solve the problem.

Upvotes: 1

Pyspark integration with Jupyter

Answers (2)

Related Questions