cogm
cogm

Reputation: 285

Spark Can't Find JDBC Driver from SBT

I'm trying to use JDBC in a Scala Spark application, and I'm compiling with sbt. However when I add the line Class.forName("com.mysql.jdbc.Driver"), it throws a ClassNotFoundException.

My sbt file is this:

name := "SparkApp"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
libraryDependencies += "com.databricks" %% "spark-csv" % "1.5.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "6.0.5"

As far as I can tell that last line is all I should need to add the JDBC driver, but it doesn't seem to be working. I've also tried Class.forName("com.mysql.jdbc.Driver").newInstance() but it has the same result, so I assume the issue is with the jdbc classes not being added correctly at all.

Upvotes: 2

Views: 4284

Answers (4)

Vineet Srivastava
Vineet Srivastava

Reputation: 23

You should pass driver jar while submitting the spark job like below:

1) spark-submit --jars mysql-connector-java-5.1.39.jar and rest of the parameters as you are passing

2) if you just wanna try on local using shell spark-shell --jars mysql-connector-java-5.1.39.jar

Update the driver to the one which you already have available and provide the absolute path to that

Upvotes: 0

desaiankitb
desaiankitb

Reputation: 1052

spark-submit \ --class com.mypack.MyClass \ --master yarn --deploy-mode cluster \ --conf spark.executor.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \ --conf spark.driver.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \

where, $POSTGRESQL_JAR_PATH and $MYSQL_JAR_PATH should be set with hdfs path to jar files.

hope this helps.

spark.executor.extraClassPath if you are running it in cluster mode. spark.driver.extraClassPath if you are running it in local.

I recommend setting both option to be on safer side.

Upvotes: 0

sgvd
sgvd

Reputation: 3939

You don't need to supply the class name to use JDBC to load data frames. Following the Spark SQL documentation, you only have to supply "jdbc" as the data source format (and indeed add the connector as a dependency) and set the right options:

val host: String = ???
val port: Int = ???
val database: String = ???
val table: String = ???
val user: String = ???
val password: String = ???

val options = Map(
      "url" -> s"jdbc:mysql://$host:$port/$database?zeroDateTimeBehavior=convertToNull",
      "dbtable" -> table,
      "user" -> user,
      "password" -> password)

val df = spark.read.format("jdbc").options(options).load()

When you submit your application to Spark, you have to either include the MySQL connector into your final jar file, or tell spark-submit to get the package as a dependency:

spark-submit --packages mysql:mysql-connector-java:6.0.5 ...

This flag also works on spark-shell or pyspark.

Upvotes: 2

dumitru
dumitru

Reputation: 2108

Your mysql driver class com.mysql.jdbc.Driver it's not present on your classpath at runtime. If your are running your spark job with spark-submit than you have at least two options:

  • provide --jar options to specify the path for the mysql-*.jar (see this post) (if both worker and driver need the class then take a close look on the spark.executor.extraJavaOptions and spark.driver.extraJavaOptions)
  • build an uber jar (fat jar) that will include your mysql-* classes on your application jar (see this post)

Upvotes: 0

Related Questions