Spark doesnt seem to use the same warehouse that Hive uses

I have started using Spark 2.0 on my Eclipse, by making a maven project and getting in all the latest dependencies. I am able to run hive queries without any problems. My concern is that Spark creates another warehouse for hive and doesn't use the data warehouse that I want. So all the hive tables that I have on my server, I'm not able to read those hive tables into my Spark datasets and do any transformations. I'm only able to create and work on new tables, but i want to read my tables in hive.

My hive-site.xml :-

<configuration><property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
  <description>metadata is stored in a MySQL server</description></property>        <property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>MySQL JDBC driver class</description></property><property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hiveuser</value>
  <description>user name for connecting to mysql server</description></property><property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>root</value>
  <description>password for connecting to mysql server</description></property><property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/usr/local/Cellar/hive–1.1.0/apache-hive-1.1.0-bin/spark-warehouse</value>
  <description>location of default database for the warehouse</description></property></configuration>

Upvotes: 1

Answers (3)

IceMimosa

Reputation: 41

you should config in spark-defaults.conf:

spark.sql.warehouse.dir hdfs://MA:8020/user/hive/warehouse

From http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html

Upvotes: 0

Ram Ghadiyaram

Reputation: 29237

As I understood, you are able to query from hive/beeline you cant able to query same table with spark program

you can print all the configuration to verify like this from your spark program.

Since you are using spark 2.0 please verify the below spark session

val spark = SparkSession
   .builder()
   .appName("yourappname")
   .config("spark.sql.warehouse.dir", warehouseLocation)
   .enableHiveSupport()
   .getOrCreate()

SparkSession exposes “catalog” as a public instance that contains methods that work with the metastore (i.e data catalog). Since these methods return a Dataset, you can use Dataset API to access or view data.

Also try below

  //fetch metadata data from the catalog
    spark.catalog.listDatabases.show(false)
    spark.catalog.listTables.show(false)

and then print spark.conf.getAll().mkString("\n"))

you can see whether any difference in hive properties(like hive.metastore.warehouse.dir or hive.metastore.uris) which were there in hive-site.xml with the above properties.

Upvotes: 1

Nirmal Ram

Reputation: 1210

In hive-site.xml add,

  <property>
    <name>hive.metastore.uris</name>
   <value>thrift://HOST_IP_ADDRESS:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

restart hive service

and then set,

1)Copy hive-site.xml from $HIVE_CONF dir to $SPARK_CONF dir

or 2)

HiveContext hiveContext = new HiveContext(sc);

hiveContext.setConf("hive.metastore.uris", "thrift://HOST_IP_ADDRESS:9083");

Upvotes: 0

Spark doesnt seem to use the same warehouse that Hive uses

Answers (3)

Related Questions