Reputation: 63281
I have a few pyspark
kernel jupyter
notebooks that had been working for months - but recently are working no longer. The pyspark
kernel itself is working: it gives blue message:
Kernel Loaded
.. and we can see the kernel is available:
But I noticed this in the jupyter
log:
[IPKernelApp] WARNING | Unknown error in handling PYTHONSTARTUP file /shared/spark/python/pyspark/shell.py:
And when attempting to do some work in spark
we get:
---> 18 df = spark.read.parquet(path)
19 if count: p(tname + ": count="+str(df.count()))
20 df.createOrReplaceTempView(tname)
NameError: name 'spark' is not defined
with no further information.
Note: the scala
spark kernel using toree
is able to read that same file through parquet successfully (and using the same code actually)
So what might be going on with the jupyter pyspark
kernel?
Upvotes: 0
Views: 2111
Reputation: 63281
Got it! I had upgraded spark
and the pyspark
kernel did not know about it.
First: which kernels
are installed:
$jupyter kernelspec list
Available kernels:
python2 /Users/sboesch/Library/Python/2.7/lib/python/site-packages/ipykernel/resources
ir /Users/sboesch/Library/Jupyter/kernels/ir
julia-1.0 /Users/sboesch/Library/Jupyter/kernels/julia-1.0
scala /Users/sboesch/Library/Jupyter/kernels/scala
scijava /Users/sboesch/Library/Jupyter/kernels/scijava
pyspark /usr/local/share/jupyter/kernels/pyspark
spark_scala /usr/local/share/jupyter/kernels/spark_scala
Let's examine the pyspark
kernel:
sudo vim /usr/local/share/jupyter/kernels/pyspark/kernel.json
Of particular interest is the spark
jar file:
PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
Is it available?
$ll "/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
ls: /shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip: No such file or directory
No it is not - so let's update that path:
$ll /shared/spark/python/lib/py4j*
-rw-r--r--@ 1 sboesch wheel 42437 Jun 1 13:49 /shared/spark/python/lib/py4j-0.10.7-src.zip
PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.7-src.zip"
After this I restarted jupyter
and the pyspark
kernel is working.
Upvotes: 1