Reputation: 61
I need to connect to a postgres database through a scala/spark app. It works perfectly when I run it in my IDE however, I am getting the following log message when I try to run the packaged executable jar with this command:
Triggering the executable jar with:
java -cp HighestPerformingCampaign-assembly-1.0.jar com.scala.Executor
Exeception thrown:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: jdbc. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:689)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:743)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
at com.scala.Executor$.findHighestCampaign(Executor.scala:31)
at com.scala.Executor$.main(Executor.scala:15)
at com.scala.Executor.main(Executor.scala)
Caused by: java.lang.ClassNotFoundException: jdbc.DefaultSource
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:663)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSo`enter code here`urce$.$anonfun$lookupDataSource$4(DataSource.scala:663)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:663)
... 6 more
My build.sbt file is set up as follows:
name := "HighestPerformingCampaign"
version := "1.0"
crossScalaVersions := Seq("2.11.12", "2.12.12")
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.1.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.1"
libraryDependencies += "org.postgresql" % "postgresql" % "9.3-1102-jdbc41"
mainClass := Some("com.scala.Executor")
assemblyJarName in assembly := "HighestPerformingCampaign-assembly-1.0.jar"
and I am using the sbt-assembly plugin, which is stored under the project folder, to generate the jar:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
Is there something that I'm missing here which is preventing the driver from getting added to the packaged jar? My connection details are specified as follows also:
val df = spark
.sqlContext
.read
.format("jdbc")
.option("url", "jdbc:postgresql:postgres")
.option("user", "postgres")
.option("password", "postgres")
.option("query", query)
.load()
Upvotes: 0
Views: 603
Reputation: 193
You can check your jar if it has the required classes using jar -tf HighestPerformingCampaign-assembly-1.0.jar
. If it does not contain the required class jdbc.DefaultSource
(which should be the case), it means the fat / packaged jar is not getting built as expected. I would suggest instead of creating this fat jar, you can create Artifacts
from the IDE (in Intellij it's at Project Settings -> Artifacts
) , which is basically creates and puts all the dependent jars under some Artifacts root directory, and then provide the path of this directory to the java command like java -cp HighestPerformingCampaign-assembly-1.0.jar:<absolute path to artifact root> com.scala.Executor
Upvotes: 1