Reputation: 848
HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift framework is actually customised as HIVESERVER2. In this way, HIVE is acting as a service. Via programming language, we can use HIVE as a database.
The relationship between Spark-SQL and HIVE is that:
Spark-SQL just utilises the HIVE setup (HDFS file system, HIVE Metastore, Hiveserver2). When we invoke /sbin/start-thriftserver2.sh (present in spark installation), we are supposed to give hiveserver2 port number, and the hostname. Then via spark's beeline, we can actually create, drop and manipulate tables in HIVE. The API can be either Spark-SQL or HIVE QL. If we create a table / drop a table, it will be clearly visible if we login into HIVE and check(say via HIVE beeline or HIVE CLI). To put in other words, changes made via Spark can be seen in HIVE tables.
My understanding is that Spark does not have its own meta store setup like HIVE. Spark just utilises the HIVE setup and simply the SQL execution happens via Spark SQL API.
Is my understanding correct here?
Then I am little confused about the usage of bin/spark-sql.sh (which is also present in Spark installation). Documentation says that via this SQL shell, we can create tables like we do above (via Thrift Server/Beeline). Now my question is: How the metadata information is maintained by spark then?
Or like the first approach, can we make spark-sql CLI to communicate to HIVE (to be specific: hiveserver2 of HIVE) ? If yes, how can we do that ?
Thanks in advance!
Upvotes: 0
Views: 140
Reputation: 192023
My understanding is that Spark does not have its own meta store setup like HIVE
Spark will start a Derby server on its own, if a Hive metastore is not provided
can we make spark-sql CLI to communicate to HIVE
Start an external metastore process, add a hive-site.xml
file to $SPARK_CONF_DIR
with hive.metastore.uris
, or use SET
SQL statements for the same.
Then spark-sql
CLI should be able to query Hive tables. From code, you need to use enableHiveSupport()
method on the SparkSession.
Upvotes: 1