Reputation: 67
I'm following this site to install Jupyter Notebook, PySpark and integrate both.
When I needed to create the "Jupyter profile", I read that "Jupyter profiles" not longer exist. So I continue executing the following lines.
$ mkdir -p ~/.ipython/kernels/pyspark
$ touch ~/.ipython/kernels/pyspark/kernel.json
I opened kernel.json
and write the following:
{
"display_name": "pySpark",
"language": "python",
"argv": [
"/usr/bin/python",
"-m",
"IPython.kernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7",
"PYTHONPATH": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python:/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip",
"PYTHONSTARTUP": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "pyspark-shell"
}
}
The paths of Spark are correct.
But then, when I run jupyter console --kernel pyspark
I get this output:
MacBook:~ Agus$ jupyter console --kernel pyspark
/usr/bin/python: No module named IPython
Traceback (most recent call last):
File "/usr/local/bin/jupyter-console", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 595, in launch_instance
app.initialize(argv)
File "<decorator-gen-113>", line 2, in initialize
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 74, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 137, in initialize
self.init_shell()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 110, in init_shell
client=self.kernel_client,
File "/usr/local/lib/python2.7/site-packages/traitlets/config/configurable.py", line 412, in instance
inst = cls(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 251, in __init__
self.init_kernel_info()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 305, in init_kernel_info
raise RuntimeError("Kernel didn't respond to kernel_info_request")
RuntimeError: Kernel didn't respond to kernel_info_request
Upvotes: 2
Views: 5846
Reputation: 4291
Many ways to integrate pyspark with jupyter notebook.
1.Install Apache Toree.
pip install jupyter
pip install toree
jupyter toree install --spark_home=path/to/your/spark_directory --interpreters=PySpark
you can check installation by
jupyter kernelspec list
you will get an entry for toree pyspark kernel
apache_toree_pyspark /home/pauli/.local/share/jupyter/kernels/apache_toree_pyspark
Afterwards if you want, you can install other intepreters like SparkR,Scala,SQL
jupyter toree install --interpreters=Scala,SparkR,SQL
2.Add these lines to bashrc
export SPARK_HOME=/path to /spark-2.2.0
export PATH="$PATH:$SPARK_HOME/bin"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
type pyspark
in terminal and it will open a jupyter notebook with sparkcontext initialized.
Install pyspark
only as a python package
pip install pyspark
Now you can import pyspark like another python package.
Upvotes: 12
Reputation: 1326
The easiest way is to use findspark. First create an environment variable:
export SPARK_HOME="{full path to Spark}"
And then install findspark:
pip install findspark
Then launch jupyter notebook and the following should work:
import findspark
findspark.init()
import pyspark
Upvotes: 5