kamalbanga
kamalbanga

Reputation: 2011

Failed to load class for data source: com.databricks.spark.csv

My build.sbt file has this:

scalaVersion := "2.10.3"
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.1.0"

I am running Spark in standalone cluster mode and my SparkConf is SparkConf().setMaster("spark://ec2-[ip].compute-1.amazonaws.com:7077").setAppName("Simple Application") (I am not using the method setJars, not sure whether I need it).

I package the jar using the command sbt package. Command I use to run the application is ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar.

On running this, I get this error:

java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

What's the issue?

Upvotes: 6

Views: 13354

Answers (6)

Raghav
Raghav

Reputation: 109

Use below Command , its working :

spark-submit --class ur_class_name --master local[*] --packages com.databricks:spark-csv_2.10:1.4.0 project_path/target/scala-2.10/jar_name.jar

Upvotes: 0

Thilina Piyadasun
Thilina Piyadasun

Reputation: 191

Use the dependencies accordingly. For example:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-csv_2.10</artifactId>
    <version>1.4.0</version>
</dependency>

Upvotes: 3

Venkataramana
Venkataramana

Reputation: 103

Here is the example that worked: spark-submit --jars file:/root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file:/root/Downloads/jars/com‌​mons-csv-1.2.jar,file:/root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local[2] target/scala-2.11/my-proj_2.11-1.0.jar

Upvotes: 0

Paul Z Wu
Paul Z Wu

Reputation: 575

add --jars option and download the jars below from repository such as search.maven.org

--jars commons-csv-1.1.jar,spark-csv-csv.jar,univocity-parsers-1.5.1.jar \

Use the --packages option as claudiaann1 suggested also works if you have internet access without proxy. If you need to go through proxy, it won't work.

Upvotes: 0

claudiaann1
claudiaann1

Reputation: 237

Include the option: --packages com.databricks:spark-csv_2.10:1.2.0 but do it after --class and before the target/

Upvotes: 1

dayman
dayman

Reputation: 680

Have you tried using the --packages argument with spark-submit? I've run into this issue with spark not respecting the dependencies listed as libraryDependencies.

Try this:

./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 
                   --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar
                   --packages com.databricks:spark-csv_2.10:1.1.0

_

From the Spark Docs:

Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command.

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

Upvotes: -2

Related Questions