Yang Du
Yang Du

Reputation: 31

How to let Spark SQL and thift server see the same Hive metastore?

Using spark-shell and HiveContext, I tried to show all the hive tables. But when I start the thirft server, and use beeline to check all tables, it is empty there.

On Spark SQL documentation, it says (1) if I put hive-site.xml to conf/ in spark, saveAsTable method for DataFrame will persist table to hive specified in the xml file. (2) if I put hive-site.xml to conf/ in spark, thriftServer will connect to the hive specified in the xml file.

Now I don't have any such xml file in conf/, so I suppose they should all use the default configuration. But clearly it is not the case, could anyone help point out the reason?

Thank you so much.


When I use spark-shell, I see the following line:

INFO Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

Does this cause the two(spark-shell and thrift-server) see different hive metastore?


The code I tried on spark-shell:

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val df = hc.sql("show tables")
df.collect()

I tried "show tables" on beeline;

Upvotes: 1

Views: 1357

Answers (1)

Yang Du
Yang Du

Reputation: 31

Turns out it is because I don't know enough about hive.

Every time when running HiveQL(for example "SHOW TABLES"), if there is no metastore_db in the current folder, it will create one for me. metastore_db stores all the table schemas so that they can be queried.

So the solution is, run all the hive-related program in the same folder. For my case, I should run start-thriftserver.sh and spark-shell in the same folder. Now both of them can share the same tables.

Furthermore, if I edit hive-site.xml to specify the metastore location, it is possible that the metastore will always be in a fixed location, which I will explore more.

Upvotes: 2

Related Questions