Reputation: 164
I have started using Spark 2.0 on my Eclipse, by making a maven project and getting in all the latest dependencies. I am able to run hive queries without any problems. My concern is that Spark creates another warehouse for hive and doesn't use the data warehouse that I want. So all the hive tables that I have on my server, I'm not able to read those hive tables into my Spark datasets and do any transformations. I'm only able to create and work on new tables, but i want to read my tables in hive.
My hive-site.xml :-
<configuration><property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description></property> <property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description></property><property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description></property><property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password for connecting to mysql server</description></property><property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/local/Cellar/hive–1.1.0/apache-hive-1.1.0-bin/spark-warehouse</value>
<description>location of default database for the warehouse</description></property></configuration>
Upvotes: 1
Views: 18105
Reputation: 41
you should config in spark-defaults.conf
:
spark.sql.warehouse.dir hdfs://MA:8020/user/hive/warehouse
From http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
Upvotes: 0
Reputation: 29237
As I understood, you are able to query from hive/beeline you cant able to query same table with spark program
Since you are using spark 2.0 please verify the below spark session
val spark = SparkSession
.builder()
.appName("yourappname")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
SparkSession exposes “catalog” as a public instance that contains methods that work with the metastore (i.e data catalog). Since these methods return a Dataset, you can use Dataset API to access or view data.
Also try below
//fetch metadata data from the catalog
spark.catalog.listDatabases.show(false)
spark.catalog.listTables.show(false)
and then print spark.conf.getAll().mkString("\n"))
you can see whether any difference in hive properties(like hive.metastore.warehouse.dir
or hive.metastore.uris
) which were there in hive-site.xml with the above properties.
Upvotes: 1
Reputation: 1210
In hive-site.xml add,
<property>
<name>hive.metastore.uris</name>
<value>thrift://HOST_IP_ADDRESS:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
restart hive service
and then set,
1)Copy hive-site.xml from $HIVE_CONF dir to $SPARK_CONF dir
or 2)
HiveContext hiveContext = new HiveContext(sc);
hiveContext.setConf("hive.metastore.uris", "thrift://HOST_IP_ADDRESS:9083");
Upvotes: 0