Ged
Ged

Reputation: 18023

Does Hive ORC ACID on Hive 3 require TEZ if not using Map Reduce?

My understanding was/is that with Hive 3, an HIVE ORC ACID table using MERGE also needs at least TEZ as underlying execution engine, if no Map Reduce, or Spark engine for Hive is used. In fact I am not convinced that HIVE MERGE, update, delete work with Spark Engine.

But from the documentation and various updates I cannot confirm these, hence this posting. Seems hard to write a coherent set of prose on this topic and I am away from a cluster.

And, the italic and bold statement from https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-version-release stating full transactional functionality I cannot follow, as I was not aware that SPARK could update, delete on HIVE ORC ACID (yet):

Apache Spark

Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames for streaming reads and writes into transactional and streaming Hive tables from Spark.

Spark executors can connect directly to Hive LLAP daemons to retrieve and update data in a transactional manner, allowing Hive to keep control of the data.

Apache Spark on HDInsight 4.0 supports the following scenarios:

Run machine learning model training over the same transactional table used for reporting. Use ACID transactions to safely add columns from Spark ML to a Hive table. Run a Spark streaming job on the change feed from a Hive streaming table. Create ORC files directly from a Spark Structured Streaming job. You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark, resulting in inconsistent results, duplicate data, or data corruption. In HDInsight 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.

Upvotes: 0

Views: 466

Answers (1)

Ged
Ged

Reputation: 18023

The bold italiced statement above is not correct.

https://issues.apache.org/jira/browse/SPARK-15348 states clearly that Spark does not allow HIVE ORC ACID processing.

MR is disappearing on various Cloud platforms and TEZ is the default engine now, so sqoop and Hive ORC ACID use this, and hence TEZ is at least required.

NB: I only asked this question as on my last assignment this discussion came up from people 'upstairs'.

Upvotes: 0

Related Questions