shashank
shashank

Reputation: 410

Hive Updates using Informatica

My Target is Hive table using Informatica ETL tool.

Updates are not supported in Hive earlier versions. So how should i do updates to records in this scenario. Is it ok to go for Hive update feature using Hive ACID and transaction feature.

Upvotes: 0

Views: 629

Answers (3)

akshat thakar
akshat thakar

Reputation: 1527

Updates are not best option while working on Hive, creating intermediate temporary tables is better design. Steps to update existing Hive table-

  1. Suppose you are working on table- tbl_employee and want to update few rows.
  2. Create new intermediate table - tbl_employee_update with exactly same number of records with updated column data.
  3. Create new intermediate table which has data from original table- tbl_employee minus rows which are already present in tbl_employee_update. Name this table as tbl_employee_retained
  4. Do union on tbl_employee_retained and tbl_employee_update and name it as tbl_employee_temp
  5. Now replace tbl_employee with tbl_employee_temp by first ensuring tbl_employee_temp is having records and other sanity checks.

Upvotes: 1

Volamr
Volamr

Reputation: 66

Informatica does supports Updates to hive tables from Informatica 9.6 HF3 version provided the tables support ACID, for more information you can refer to this link (https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions), but instead of doing this, I would rather do this a in a two step process,

1) Identify all the records which exist only in the target and the records which exist ONLY in the stage data 2) Merge these two and load them into a temporary table. 3) Finally re-name temporary table to the actual target table name

The above would work only SCD type 1 implementations

Upvotes: 1

jpopesculian
jpopesculian

Reputation: 712

You should look into event sourcing (https://msdn.microsoft.com/en-us/library/dn589792.aspx).

Think of your database as storing events instead of items. So if you have some counter object you want to use in your database, instead of updating counter from 0 to 1 to 2, etc, you simply insert a new document whenever you increment and then take the sum/count of those documents.

Upvotes: 0

Related Questions