Nan
Nan

Reputation: 339

possible to do upsert in hive acid enabled orc table

After enable the acid support on hive. I can insert/update/delete row from hive table (ORC file based table), wondering is that possible to do upsert? because when user provide data, they don't specify it's insert or update, and if we need to delete before insert for every row to simulate an upsert, it might be very slow but not sure about that performance yet, our update/insert rate is low, less than 5%. For us, latency is not important, but throughput certainly matters. and if hive currently don't support upsert, is there any plan to support it? thanks

Upvotes: 1

Views: 2359

Answers (2)

dassum
dassum

Reputation: 5113

One Approach could be to use Hive JDBC Connection to perform delete insert or merge in Hive Tables. And then perform full compaction can be executed on Hive ORC table . After Compaction is over data is available through Spark.

Upvotes: 0

leftjoin
leftjoin

Reputation: 38335

The work is in progress: https://issues.apache.org/jira/browse/HIVE-10924 Probably the throughput will be limited in ACID mode.

Currently you can simulate upsert not in ACID mode using full join and do full partition/table rewrite. Latency is rather big, throughput is virtually unlimited. See here: https://stackoverflow.com/a/37744071/2700344

Upvotes: 2

Related Questions