Reputation: 1591
From the Spark downloads page, if I download the tar file for v2.0.1, I see that it contains some jars that I find useful to include in my app.
If I download the tar file for v1.6.2 instead, I don't find the jars folder in there. Is there an alternate package type I should use from that site? I am currently choosing the default (pre-built for Hadoop 2.6). Alternately, where I can find those Spark jars - should I get each of them individually from http://spark-packages.org?
Here is an indicative bunch of jars I want to use:
Upvotes: 3
Views: 12501
Reputation: 9067
The way Sparks ships its run-time has changed from V1 to V2.
$SPARK_HOME/jars
spark-assembly*.jar
under $SPARK_HOME/lib
that
contains all the dependencies.I believe you can change the default behavior, but that would require recompiling Spark on your own...
And also, about spark-csv
specifically:
spark-csv
(for Scala 2.10) from Spark-Packages.org plus commons-csv
from Commons.Apache.org and add both JARs to your CLASSPATH--jars
on command line, or with prop spark.driver.extraClassPath
+ instruction sc.addJar()
if the command line does not work for some reason)$SPARK_HOME/bin/spark-class
as of Spark 2.1.x (greatly simplified)
# Find Spark jars
SPARK_JARS_DIR="${SPARK_HOME}/jars"
LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*"
And as of Spark 1.6.x
# Find assembly jar
ASSEMBLY_DIR="${SPARK_HOME}/lib"
ASSEMBLY_JARS="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" || true)"
SPARK_ASSEMBLY_JAR="${ASSEMBLY_DIR}/${ASSEMBLY_JARS}"
LAUNCH_CLASSPATH="$SPARK_ASSEMBLY_JAR"
Upvotes: 9