Spark - jdbc write fails in Yarn cluster mode but works in spark-shell

Question

I am using Spark 1.6.2, Hadoop 2.6, Scala 2.10.5 and Java 1.7

I am using JDBC to read data from MSSQL and this works without any problem:

val hqlContext = new HiveContext(sc)

val url = "jdbc:sqlserver://1.1.1.1:1111;database=CIQOwnershipProcessing;user=OwnershipUser;password=Ownership123"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val df1 = hqlContext.read.format("jdbc").options(
   Map("url" -> url, "driver" -> driver,
  "dbtable" -> "(select * from OwnershipStandardization_PositionSequence_tbl) as ps")).load()

And, while writing back dataframe to MSSQL, I am using the JDBC write as shown below. This works fine in Spark-shell but fails when I do spark-submit in Yarn-Cluster mode. What am I missing ?

val prop = new java.util.Properties
df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

This is how my spark-submit command looks like. As you can see, I am passing the SQLJDBC jar path too. And, I have also specified the jdbc jar path in "spark.executor.extraClassPath" property in spark-defaults.conf on all nodes of the cluster. Since the JDBC read is working, I doubt if it has anything to do with the classpaths.

spark-submit --class com.spgmi.csd.OshpStdCarryOver --master  yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead=2048 --num-executors 1 --executor-cores 2 --driver-memory 3g --executor-memory 8g --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,/usr/share/java/sqljdbc_4.1/enu/sqljdbc41.jar --files $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/lib/spark-poc2-17.1.0.jar

The error thrown in the Yarn-Cluster mode is:

17/01/05 10:21:31 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.InstantiationException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper java.lang.InstantiationException: org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper at java.lang.Class.newInstance(Class.java:368) at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278) at com.spgmi.csd.OshpStdCarryOver$.main(SparkOshpStdCarryOver.scala:175) at com.spgmi.csd.OshpStdCarryOver.main(SparkOshpStdCarryOver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)

nirali.gandhi · Accepted Answer

I was facing the same issue. I resolved it by setting connection property in prop.

val prop = new java.util.Properties
prop.setProperty("driver","com.mysql.jdbc.Driver")

now pass this prop in

df1.write.mode("Overwrite").jdbc(url, "CIQOwnershipProcessing.dbo.df_sparkop",prop)

Spark - jdbc write fails in Yarn cluster mode but works in spark-shell

Answers (2)

Related Questions