Reputation: 2011
My build.sbt
file has this:
scalaVersion := "2.10.3"
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.1.0"
I am running Spark in standalone cluster mode and my SparkConf is SparkConf().setMaster("spark://ec2-[ip].compute-1.amazonaws.com:7077").setAppName("Simple Application")
(I am not using the method setJars
, not sure whether I need it).
I package the jar using the command sbt package
. Command I use to run the application is ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar
.
On running this, I get this error:
java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
What's the issue?
Upvotes: 6
Views: 13354
Reputation: 109
Use below Command , its working :
spark-submit --class ur_class_name --master local[*] --packages com.databricks:spark-csv_2.10:1.4.0 project_path/target/scala-2.10/jar_name.jar
Upvotes: 0
Reputation: 191
Use the dependencies accordingly. For example:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
Upvotes: 3
Reputation: 103
Here is the example that worked: spark-submit --jars file:/root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file:/root/Downloads/jars/commons-csv-1.2.jar,file:/root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local[2] target/scala-2.11/my-proj_2.11-1.0.jar
Upvotes: 0
Reputation: 575
add --jars
option and download the jars below from repository such as search.maven.org
--jars commons-csv-1.1.jar,spark-csv-csv.jar,univocity-parsers-1.5.1.jar \
Use the --packages
option as claudiaann1 suggested also works if you have internet access without proxy. If you need to go through proxy, it won't work.
Upvotes: 0
Reputation: 237
Include the option: --packages com.databricks:spark-csv_2.10:1.2.0 but do it after --class and before the target/
Upvotes: 1
Reputation: 680
Have you tried using the --packages argument with spark-submit? I've run into this issue with spark not respecting the dependencies listed as libraryDependencies.
Try this:
./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077
--class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar
--packages com.databricks:spark-csv_2.10:1.1.0
_
From the Spark Docs:
Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command.
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
Upvotes: -2