Updates and Inserts

Question

We are receiving Hourly JSON data into HDFS. The size of the data would be approx 5-6 GB per hour.

when matched record found on the final table then Update (or) Delete
if the record not matched in the final dataset then insert the record.

We have tried the Hive merge option for the USE case . this is taking more than an hour to process the merge operation in Hive . Is there any other alternative approach to resolve the use case.So basically every day we are adding 150GB of data into hive , Every other day We have to scan 150Gb of data to find whether we need to do update/insert

What is best way to do Upserts(Updates and Inserts in Hadoop) for large dataset. Hive or HBase or Nifi . What is flow.

Updates and Inserts

Answers (1)

Related Questions