Reputation: 285
I'm trying to use JDBC in a Scala Spark application, and I'm compiling with sbt. However when I add the line Class.forName("com.mysql.jdbc.Driver")
, it throws a ClassNotFoundException.
My sbt file is this:
name := "SparkApp"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
libraryDependencies += "com.databricks" %% "spark-csv" % "1.5.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "6.0.5"
As far as I can tell that last line is all I should need to add the JDBC driver, but it doesn't seem to be working. I've also tried Class.forName("com.mysql.jdbc.Driver").newInstance()
but it has the same result, so I assume the issue is with the jdbc classes not being added correctly at all.
Upvotes: 2
Views: 4284
Reputation: 23
You should pass driver jar while submitting the spark job like below:
1) spark-submit --jars mysql-connector-java-5.1.39.jar and rest of the parameters as you are passing
2) if you just wanna try on local using shell spark-shell --jars mysql-connector-java-5.1.39.jar
Update the driver to the one which you already have available and provide the absolute path to that
Upvotes: 0
Reputation: 1052
spark-submit \
--class com.mypack.MyClass \
--master yarn --deploy-mode cluster \
--conf spark.executor.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \
--conf spark.driver.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \
where, $POSTGRESQL_JAR_PATH
and $MYSQL_JAR_PATH
should be set with hdfs path to jar files.
hope this helps.
spark.executor.extraClassPath
if you are running it in cluster mode.
spark.driver.extraClassPath
if you are running it in local.
I recommend setting both option to be on safer side.
Upvotes: 0
Reputation: 3939
You don't need to supply the class name to use JDBC to load data frames. Following the Spark SQL documentation, you only have to supply "jdbc"
as the data source format (and indeed add the connector as a dependency) and set the right options:
val host: String = ???
val port: Int = ???
val database: String = ???
val table: String = ???
val user: String = ???
val password: String = ???
val options = Map(
"url" -> s"jdbc:mysql://$host:$port/$database?zeroDateTimeBehavior=convertToNull",
"dbtable" -> table,
"user" -> user,
"password" -> password)
val df = spark.read.format("jdbc").options(options).load()
When you submit your application to Spark, you have to either include the MySQL connector into your final jar file, or tell spark-submit
to get the package as a dependency:
spark-submit --packages mysql:mysql-connector-java:6.0.5 ...
This flag also works on spark-shell
or pyspark
.
Upvotes: 2
Reputation: 2108
Your mysql driver class com.mysql.jdbc.Driver it's not present on your classpath at runtime. If your are running your spark job with spark-submit than you have at least two options:
Upvotes: 0