Metadata
Metadata

Reputation: 2083

Unable to connect to Hive server using JDBC connection

I am trying to connect to Hive server from scala code as below.

def getHiveConnection(): Connection = {
    println("Building Hive connection..")
    val driver   = "org.apache.hive.jdbc.HiveDriver"
    val user     = "user"
    val pwd      = "pwd
    val url      = "jdbc:hive2://ip-00-000-000-000.ec2.internal:00000/dbname;principal=hive/[email protected]"
    var connection: Connection = null

    val conf = new Configuration()
    conf.set("hadoop.security.authentication", "Kerberos")
    UserGroupInformation.setConfiguration(conf)

    try {
        println("Setting the driver..")
        Class.forName(driver)
        println("pre connection")
        if((connection == null)  || connection.isClosed()) {
            connection = DriverManager.getConnection(url, user, pwd)
            println("Hive connection eshtablished.")
        }
    } catch {
        case cnf:ClassNotFoundException => println("Invalid driver used. Check the settings.")
            cnf.printStackTrace()
        case e:Exception => println("Other exception.")
            e.printStackTrace()
    }
    connection
}

I create a jar file from the program on IntelliJ and then run the jar using spar-submit as I need to run some sql that is not supported by SPARK.

spark-submit:

SPARK_MAJOR_VERSION=2 spark-submit --class com.package.program.Begin --master=yarn --conf spark.ui.port=4090 --driver-class-path /home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --conf spark.jars=/home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --executor-cores 4 --executor-memory 4G --keytab /home/username/username.keytab --principal [email protected] --files /$SPARK_HOME/conf/hive-site.xml,connection.properties --name Splinter splinter_2.11-0.1.jar

When I submit the code, it fails with the exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface

To be precise, the exception comes at the line:

connection = DriverManager.getConnection(url, user, pwd)

Dependencies I added in the SBT files can be seen below:

name := "Splinter"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.0",
  "org.apache.spark" %% "spark-sql" % "2.0.0",
  "org.json4s" %% "json4s-jackson" % "3.2.11",
  "org.apache.httpcomponents" % "httpclient" % "4.5.3",
  "org.apache.spark" %% "spark-hive" % "2.0.0",
)
libraryDependencies += "org.postgresql" % "postgresql" % "42.1.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-auth" % "2.6.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.2.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-common" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.6.5" % "provided"
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-common" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-metastore" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-service" % "2.3.5"
libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.26"
libraryDependencies += "commons-cli" % "commons-cli" % "1.4"
libraryDependencies += "org.apache.hive" % "hive-service-rpc" % "2.1.0"
libraryDependencies += "org.apache.hive" % "hive-cli" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-exec" % "2.3.4" excludeAll
  ExclusionRule(organization = "org.pentaho")

Along with the dependencies, I moved all the jars from the dir via --jars in spark-submit and that didn't work either.

The full exception stack could be seen below:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)
    at com.data.stages.ExchangePartition.getHiveConnection(ExchangePartition.scala:30)
    at com.data.stages.ExchangePartition.exchange(ExchangePartition.scala:44)
    at com.partition.source.Pickup$.main(Pickup.scala:124)
    at com.partition.source.Pickup.main(Pickup.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 16 more

Could anyone let me know what dependencies am I missing in the sbt file ? If not, what could be the mistake I am doing here since, the same type of code works in Jave with the same libraries(dependencies) in the project and I couldn't understand what is wrong here ? Any help is much appreciated.

Upvotes: 4

Views: 1526

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29165

I dont know you are using client or cluster mode for the spark-submit.

Could anyone let me know what dependencies am I missing in the sbt file ?

But dependency which you added is correct.

libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"

I would suggest you to go with uber jar i.e. package all jars with dependencies as one jar so that nothing is missed or left out.

How to make uber jar here

Also add this code to your driver ... understand what jars are coming in to your classpath.

val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
    case null => Array()
    case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
    case _ => urlsinclasspath(cl.getParent)
  }

Upvotes: 3

Related Questions