Reputation: 41
I would like to seek for helps related to Anaconda Jupyter notebook. I would like to write PySpark and SparkR in Jupyter notebook and I followed the online tutorial that teach how to install Apache Toree together with Jupyter notebook.
I am using the Cloudera Manager parcels to manage my kerberized Hadoop cluster.
However, I can't open the kernel for Apache Toree PySpark with the error below in the server log.
[I 15:24:50.529 NotebookApp] Creating new notebook in
[I 15:24:52.079 NotebookApp] Kernel started: 8cb4838c-2171-4672-96a4-b21ef191ffc6
Starting Spark Kernel with SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p2024.2115/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark).
WARNING: Running spark-class from user-defined location.
Exception in thread "main" java.lang.NoSuchMethodError: joptsimple.OptionParser.acceptsAll(Ljava/util/Collection;Ljava/lang/String;)Ljoptsimple/OptionSpecBuilder;
at org.apache.toree.boot.CommandLineOptions.<init>(CommandLineOptions.scala:37)
at org.apache.toree.Main$delayedInit$body.apply(Main.scala:25)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at org.apache.toree.Main$.main(Main.scala:24)
at org.apache.toree.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I've put the jopt-simple-4.5.jar in the Toree lib and Spark home. Is there any place that I have to put the jar there so that it can find it out when trying to create new notebook? Thanks.
Best regards, Ruka
Upvotes: 2
Views: 1292
Reputation: 145
The simpliest solution I found is to add the following options to spark-submit:
--conf "spark.driver.extraClassPath=/usr/local/share/jupyter/kernels/apache_toree_scala/lib/toree-assembly-0.1.0-incubating.jar" --conf "spark.executor.extraClassPath=/usr/local/share/jupyter/kernels/apache_toree_scala/lib/toree-assembly-0.1.0-incubating.jar"
This can be added either to the __TOREE_SPARK_OPTS__
variable of the /usr/local/share/jupyter/kernels/apache_toree_scala/kernel.json
file or directly to the bash command in the /usr/local/share/jupyter/kernels/apache_toree_scala/bin/run.sh
file.
By adding this you force the classloader to load joptsimple.OptionParser
from Toree JAR rather than from default CDH libraries.
P. S. Here is a Toree version that is compatible with CDH 5.10.0: https://github.com/Myllyenko/incubator-toree/releases
Upvotes: 2