How to integrate Jupyter notebook scala kernel with apache spark?

Question

I have installed Scala kernel based on this doc: https://github.com/jupyter-scala/jupyter-scala Kernel is there:

$ jupyter kernelspec list
Available kernels:
  python3     /usr/local/homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel/resources
  scala       /Users/bobyfarell/Library/Jupyter/kernels/scala

When I try to use Spark in the notebook I get this:

val sparkHome = "/opt/spark-2.3.0-bin-hadoop2.7"
val scalaVersion = scala.util.Properties.versionNumberString
import org.apache.spark.ml.Pipeline

Compilation Failed
Main.scala:57: object apache is not a member of package org
 ; import org.apache.spark.ml.Pipeline
              ^

I tried:

Setting SPARK_HOME and CLASSPATH to the location of $SPARK_HOME/jars
Setting -cp option pointing to $SPARK_HOME/jars in kernel.json
Setting classpath.add call before imports

None of these helped. Please note I don't want to use Toree, I want to use standalone spark and Scala kernel with Jupyter. A similar issue is reported here too: https://github.com/jupyter-scala/jupyter-scala/issues/63

Joe Pallas · Accepted Answer

It doesn't look like you are following the jupyter-scala directions for using Spark. You have to load spark into the kernel using the special imports.

How to integrate Jupyter notebook scala kernel with apache spark?

Answers (1)

Related Questions