Curtis Chong
Curtis Chong

Reputation: 811

How to import libraries in Spark Notebook

I'm having trouble importing magellan-1.0.4-s_2.11 in spark notebook. I've downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and have tried placing SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11 in the Start of Customized Settings section of the spark-notebook file of the bin folder.

Here are my imports

import magellan.{Point, Polygon, PolyLine}
import magellan.coord.NAD83
import org.apache.spark.sql.magellan.MagellanContext
import org.apache.spark.sql.magellan.dsl.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

And my errors...

<console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan
       import magellan.{Point, Polygon, PolyLine}
              ^
<console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan
       import magellan.coord.NAD83
                       ^
<console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan
       import org.apache.spark.sql.magellan.MagellanContext

I then tried to import the new library like any other library by placing it into the main script like so:

$lib_dir/magellan-1.0.4-s_2.11.jar"

This didn't work and I'm left scratching my head wondering what I've done wrong. How do I import libraries such as magellan into spark notebook?

Upvotes: 9

Views: 6464

Answers (3)

OMN
OMN

Reputation: 11

The easy way, you should set or add the EXTRA_CLASSPATH environnent variable to point to your .jar file downloaded : export EXTRA_CLASSPATH = </link/to/your.jar> or set EXTRA_CLASSPATH= </link/to/your.jar> in wondows OS. Here find the detailed solution.

Upvotes: 0

0asa
0asa

Reputation: 224

I would suggest to check this:

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#import-download-dependencies

and

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#add-spark-packages

I think the :dp magic command is depreciated, instead you should add your custom dependencies in the notebook metadata. You can go in the menu Edit > Edit notebook metadata, there add something like:

"customDeps": [
   "harsha2010 % magellan % 1.0.4-s_2.11"
]

Once done, you will need to restart the kernel, you can check in the browser console if the package is being downloaded properly.

Upvotes: 1

Mateusz Kubuszok
Mateusz Kubuszok

Reputation: 27535

Try evaluating something like

:dp "harsha2010" % "magellan" % "1.0.4-s_2.11"

It will load the library into Spark, allowing it to be imported - assuming it can be obtained though the Maven repo. In my case it failed with a message:

failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786

I think file was to big and connection was interrupted before whole file could be downloaded.

Workaround

So I downloaded the JAR manually from:

http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/

and copied it into the:

/tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11

And then :dp command worked. Try Calling it first, and if it will fail copy JAR into the right path to make things work.

Better solution

I should investigate why download failed to fix it in the first place... or put that library in my local M2 repo. But that should get you going.

Upvotes: 1

Related Questions