Pa1
Pa1

Reputation: 891

Spark SQL vs HIVE on Spark

Difference between Spark-SQL and Hive on Spark. I am going through the documentation of spark and sql and trying to understand the difference between Spark-SQL and HIVE on Spark.

  1. Consider a case when I initiate a spark session without any obvious hive support like copying hive-site.xml and then persist a table in my spark program, where will the data and metadata be stored. Will spark create a new Hive Metastore (like derby)?
  2. Consider a case when I initiate a spark session with hive support like copying hive-ste.xml and making spark aware of existing hive. Then if I persist the table will data and metadata be stored in my existing Hive Metastore and Data in Warehouse directory of HDFS.
  3. If I run HIVE by changing the execution engine property to Spark then is it same as above mentioned case 2 ?

Thanks.

Upvotes: 1

Views: 1004

Answers (1)

Sandip Sinha
Sandip Sinha

Reputation: 49

  1. When you initiate a spark session, the data can be stored in S3 or HDFS.It will not inherently create a Hive session without you explicitly creating so.

  2. Yes if you use the 'saveastable' clause referencing a Hive table. the data will be persisted within the HDFS. Bear in mind if you drop the HDFS instance such as in EMR the table will be dropped along with its data.

Not sure about question # 3

Upvotes: 0

Related Questions