Rajarshi Maity
Rajarshi Maity

Reputation: 23

Spark Hive Integration issue with eclipse

i am using cloudera-quickstart-vm-5.12.0-0-virtualbox-disk1 for my BigData practise.

i was trying to integrate spark and hive using a scala code. this scala code is being written in windows eclipse. once the code is written i will create a jar and then pass this jar to my cloudera cluster. from there i will execute this code using spark-submit.

Spark Logic -

package sparkhiveintegration_package_spark

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.sql.SparkSession

object objSparkHiveIntegration {

    def main(arg:Array[String]):Unit=
    {
    val warehouseLocation = "/user/hive/warehouse/"
     
    val sc = new SparkConf().setAppName("SparkHiveIntegrationTest").setMaster("local[*]")
              .set("spark.sql.warehouse.dir", warehouseLocation)
              
    val sctxt = new SparkContext(sc)
    
    sctxt.setLogLevel("Error") 

val ssc = SparkSession.builder().config(sc)
                          .enableHiveSupport().getOrCreate()
                          
    import ssc.implicits._
    
    ssc.sql("use practise")
    val sprk_hve_read_df = ssc.sql("select * from customers")
    sprk_hve_read_df.show()
    
    //sprk_hve_read_df.write.format("csv").option("header","true").mode("overwrite")
    //               .save("user/cloudera/sprk_hive_integration/")       
    }
}

Steps i took -

in eclipse i have added the maven dependencies to pom.xml as mentioned in https://sparkbyexamples.com/spark/how-to-connect-spark-to-remote-hive/

my spark version is 2.3.1 Dependencies added into eclipse - (i followed the steps mentioned in https://learnjava.co.in/how-to-add-maven-dependencies-via-eclipse/)

i then copy pasted hive-site.xml,core-site.xml,hdfs-site.xml to spark conf directory which is /usr/local/spark/conf/

what i want to do is i have created a database called practise from hive shell using the command create database database_name and so on.

i used sqoop job to move a customer table csv file to hive. now my table customer is in practise database under hive. the path is /user/hive/warehouse/practise.db/customers and the file is in /user/hive/warehouse/practise.db/customers/customer.csv.

when i run the sql statement in hive select * from practise.customers limit by 10; it retrieves the data correctly.

so i want the code that i have mentioned above to retrieve the data and then i wanted to perform some transformation, actions joins etc etc and then write it in an avro format to some hdfs location.

my spark-submit command - spark-submit --master local[*] --class sparkhiveintegration_package_spark.objSparkHiveIntegration /home/cloudera/externalJars/SparkRDDPractiseSession-0.0.1-SNAPSHOT.jar

when i run this command i get errors like

table or view not found.
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'practise' not found;

Upvotes: 1

Views: 48

Answers (0)

Related Questions