Reputation: 703
I have a machine with JupyterHub (Python2,Python3,R and Bash Kernels). I have Spark(scala) and off course PySpark working. I can even use PySpark inside an interactive IPython notebook with a command like:
IPYTHON_OPTS="notebook" $path/to/bin/pyspark
(this open a Jupyter notebook and inside Python2 I can use Spark)
BUT I can't get PySpark working inside JupyterHub.
the spark kernel is more than what i really need.
I only need Pyspark inside JupyterHub. Any suggestion ?
thanks.
Upvotes: 3
Views: 5866
Reputation: 1092
I have created a public gist to configure spark2.x with jupyterhub & cdh5.13 cluster.
Upvotes: 0
Reputation: 5586
You need to configure the pyspark kernel.
On my server jupyter kernels are located at:
/usr/local/share/jupyter/kernels/
You can create a new kernel by making a new directory:
mkdir /usr/local/share/jupyter/kernels/pyspark
Then create the kernel.json file - I paste my as a reference:
{
"display_name": "pySpark (Spark 1.6.0)",
"language": "python",
"argv": [
"/usr/local/bin/python2.7",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"PYSPARK_PYTHON": "/usr/local/bin/python2.7",
"SPARK_HOME": "/usr/lib/spark",
"PYTHONPATH": "/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/",
"PYTHONSTARTUP": "/usr/lib/spark/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master yarn-client pyspark-shell"
}
}
Adjust the paths and python versions and your pyspark kernel is good to go.
Upvotes: 6
Reputation: 28683
You could start jupyter as usual, and add the following to the top of your code:
import sys
sys.path.insert(0, '<path>/spark/python/')
sys.path.insert(0, '<path>/spark/python/lib/py4j-0.8.2.1-src.zip')
import pyspark
conf = pyspark.SparkConf().set<conf settings>
sc = pyspark.SparkContext(conf=conf)
and change the parts in angled brackets as appropriate for you.
Upvotes: 4
Reputation: 8449
I didn't try it with jupiter hub, but this approach helped me with other tools (like spyder)
I understand the jupiter server is itself a python script.
so:
copy (or rename) jupyterhub
to jupyterhub.py
run:
spark-submit jupyterhub.py
(replace spark-submit and jupyterhub.py with the full path of those files)
Upvotes: 0