Reputation: 2083
I am trying to connect to Hive server from scala code as below.
def getHiveConnection(): Connection = {
println("Building Hive connection..")
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "user"
val pwd = "pwd
val url = "jdbc:hive2://ip-00-000-000-000.ec2.internal:00000/dbname;principal=hive/[email protected]"
var connection: Connection = null
val conf = new Configuration()
conf.set("hadoop.security.authentication", "Kerberos")
UserGroupInformation.setConfiguration(conf)
try {
println("Setting the driver..")
Class.forName(driver)
println("pre connection")
if((connection == null) || connection.isClosed()) {
connection = DriverManager.getConnection(url, user, pwd)
println("Hive connection eshtablished.")
}
} catch {
case cnf:ClassNotFoundException => println("Invalid driver used. Check the settings.")
cnf.printStackTrace()
case e:Exception => println("Other exception.")
e.printStackTrace()
}
connection
}
I create a jar file from the program on IntelliJ and then run the jar using spar-submit as I need to run some sql that is not supported by SPARK.
spark-submit:
SPARK_MAJOR_VERSION=2 spark-submit --class com.package.program.Begin --master=yarn --conf spark.ui.port=4090 --driver-class-path /home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --conf spark.jars=/home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --executor-cores 4 --executor-memory 4G --keytab /home/username/username.keytab --principal [email protected] --files /$SPARK_HOME/conf/hive-site.xml,connection.properties --name Splinter splinter_2.11-0.1.jar
When I submit the code, it fails with the exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface
To be precise, the exception comes at the line:
connection = DriverManager.getConnection(url, user, pwd)
Dependencies I added in the SBT files can be seen below:
name := "Splinter"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0",
"org.apache.spark" %% "spark-sql" % "2.0.0",
"org.json4s" %% "json4s-jackson" % "3.2.11",
"org.apache.httpcomponents" % "httpclient" % "4.5.3",
"org.apache.spark" %% "spark-hive" % "2.0.0",
)
libraryDependencies += "org.postgresql" % "postgresql" % "42.1.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-auth" % "2.6.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.2.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-common" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.6.5" % "provided"
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-common" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-metastore" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-service" % "2.3.5"
libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.26"
libraryDependencies += "commons-cli" % "commons-cli" % "1.4"
libraryDependencies += "org.apache.hive" % "hive-service-rpc" % "2.1.0"
libraryDependencies += "org.apache.hive" % "hive-cli" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-exec" % "2.3.4" excludeAll
ExclusionRule(organization = "org.pentaho")
Along with the dependencies, I moved all the jars from the dir via --jars
in spark-submit and that didn't work either.
The full exception stack could be seen below:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at com.data.stages.ExchangePartition.getHiveConnection(ExchangePartition.scala:30)
at com.data.stages.ExchangePartition.exchange(ExchangePartition.scala:44)
at com.partition.source.Pickup$.main(Pickup.scala:124)
at com.partition.source.Pickup.main(Pickup.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
Could anyone let me know what dependencies am I missing in the sbt file ? If not, what could be the mistake I am doing here since, the same type of code works in Jave with the same libraries(dependencies) in the project and I couldn't understand what is wrong here ? Any help is much appreciated.
Upvotes: 4
Views: 1526
Reputation: 29165
I dont know you are using client or cluster mode for the spark-submit.
Could anyone let me know what dependencies am I missing in the sbt file ?
But dependency which you added is correct.
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"
I would suggest you to go with uber jar i.e. package all jars with dependencies as one jar so that nothing is missed or left out.
How to make uber jar here
Also add this code to your driver ... understand what jars are coming in to your classpath.
val urls = urlsinclasspath(getClass.getClassLoader).foreach(println)
def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
case _ => urlsinclasspath(cl.getParent)
}
Upvotes: 3