Reputation: 31
Using spark-shell
and HiveContext
, I tried to show all the hive tables. But when I start the thirft server
, and use beeline
to check all tables, it is empty there.
On Spark SQL documentation, it says
(1) if I put hive-site.xml
to conf/
in spark, saveAsTable
method for DataFrame
will persist table to hive specified in the xml file.
(2) if I put hive-site.xml
to conf/
in spark, thriftServer
will connect to the hive specified in the xml file.
Now I don't have any such xml file in conf/
, so I suppose they should all use the default configuration. But clearly it is not the case, could anyone help point out the reason?
Thank you so much.
When I use spark-shell
, I see the following line:
INFO Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
Does this cause the two(spark-shell
and thrift-server
) see different hive metastore?
The code I tried on spark-shell
:
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val df = hc.sql("show tables")
df.collect()
I tried "show tables
" on beeline;
Upvotes: 1
Views: 1357
Reputation: 31
Turns out it is because I don't know enough about hive.
Every time when running HiveQL(for example "SHOW TABLES
"), if there is no metastore_db
in the current folder, it will create one for me. metastore_db
stores all the table schemas so that they can be queried.
So the solution is, run all the hive-related program in the same folder. For my case, I should run start-thriftserver.sh
and spark-shell
in the same folder. Now both of them can share the same tables.
Furthermore, if I edit hive-site.xml
to specify the metastore location, it is possible that the metastore will always be in a fixed location, which I will explore more.
Upvotes: 2