WestCoastProjects
WestCoastProjects

Reputation: 63281

pyspark kernel on Jupyter generates "spark not found" error

I have a few pyspark kernel jupyter notebooks that had been working for months - but recently are working no longer. The pyspark kernel itself is working: it gives blue message:

    Kernel Loaded

.. and we can see the kernel is available:

enter image description here

But I noticed this in the jupyter log:

[IPKernelApp] WARNING | Unknown error in handling PYTHONSTARTUP file /shared/spark/python/pyspark/shell.py:

And when attempting to do some work in spark we get:

---> 18     df = spark.read.parquet(path)
     19     if count: p(tname + ": count="+str(df.count()))
     20     df.createOrReplaceTempView(tname)

NameError: name 'spark' is not defined

with no further information.

Note: the scala spark kernel using toree is able to read that same file through parquet successfully (and using the same code actually)

So what might be going on with the jupyter pyspark kernel?

Upvotes: 0

Views: 2111

Answers (1)

WestCoastProjects
WestCoastProjects

Reputation: 63281

Got it! I had upgraded spark and the pyspark kernel did not know about it.

First: which kernels are installed:

$jupyter kernelspec list

Available kernels:
  python2        /Users/sboesch/Library/Python/2.7/lib/python/site-packages/ipykernel/resources
  ir             /Users/sboesch/Library/Jupyter/kernels/ir
  julia-1.0      /Users/sboesch/Library/Jupyter/kernels/julia-1.0
  scala          /Users/sboesch/Library/Jupyter/kernels/scala
  scijava        /Users/sboesch/Library/Jupyter/kernels/scijava
  pyspark        /usr/local/share/jupyter/kernels/pyspark
  spark_scala    /usr/local/share/jupyter/kernels/spark_scala

Let's examine the pyspark kernel:

sudo vim  /usr/local/share/jupyter/kernels/pyspark/kernel.json

Of particular interest is the spark jar file:

PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"

Is it available?

$ll "/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip"
ls: /shared/spark/python/:/shared/spark/python/lib/py4j-0.10.4-src.zip: No such file or directory

No it is not - so let's update that path:

 $ll /shared/spark/python/lib/py4j*
-rw-r--r--@ 1 sboesch  wheel  42437 Jun  1 13:49 /shared/spark/python/lib/py4j-0.10.7-src.zip


PYTHONPATH="/shared/spark/python/:/shared/spark/python/lib/py4j-0.10.7-src.zip"

After this I restarted jupyter and the pyspark kernel is working.

Upvotes: 1

Related Questions