Dipak
Dipak

Reputation: 39

How do I achieve update in hive in older version

Lets say I have a table with the below data:

Col1 Col2 Col3
Dipak More 123
Sachin Patil 123
Pradip More 999

I want to update the Dipak with 222 values in Col3 So as hive won't allow you to update the files.So how we can achieve in hive

Upvotes: 1

Views: 1009

Answers (4)

Kishor Baindoor
Kishor Baindoor

Reputation: 111

Here is my solution/work around if you are using old hive version. This works better when you have large data in target table which we can't drop and recreate with full data every time.

create one more table say delete_keys table. This will hold all the key from main table which are deleted along with its surrogate key.

While loading incremental data into main table, do a left join with main table. For all the matching records, we ideally should update the main table. But instead, we take keys (along with surrogate key) from main table for all matching records and insert that to delete_keys table. Now we can insert all delta records into main table as it irrespective of whether they are to be updated or inserted.

Create view on main table using delete-keys table so that matching keys with delete-keys table are not fetched. So, this view will be final target table. This view will not show records from main table which are updated with latest records.

Upvotes: 0

IamDataEngineer
IamDataEngineer

Reputation: 45

Here table t2 is an updated table where records get an update and table t1 holds history records.

Below hive query will insert only updated records not already existing records

Record details :-

hive> select * from t1;

OK

10  aaa ny

20  bbb ny

Time taken: 0.02 seconds, Fetched: 2 row(s)

hive> select * from t2;

OK

20  bbb ny

10  aaa SFO .   -- updated record

Time taken: 0.023 seconds, Fetched: 2 row(s)

Query:-

insert into table t1 select c.eid as eid,c.ename as ename,c.loc as loc from t2 c where c.eid in (select a.eid from t1 a left outer join t2 b on a.eid=b.eid and a.loc= b.loc where b.loc is null);

Output:-

hive> select * from t1;

OK

10  aaa ny

20  bbb ny

10  aaa SFO -- updated in t1 table

Time taken: 0.015 seconds, Fetched: 3 row(s)

hive> 

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

In any case you'll have to rebuild the whole data set.

I would suggest using CTAS (Create Table As Select) to create a table with the requested data and then renaming the tables.
In this way you'll have the ability to rollback if something go wrong.

create table mytable_tmp
as
select  Col1
       ,Col2
       ,case when Col1 = 'Dipak' then 222 else Col3 end

from    mytable
;

alter table mytable rename to mytable_bck_20170311
;

alter table mytable_tmp rename to mytable
;

You can do it in one move, but bear in mind that if something goes wrong, E.g. typo in your query - your data is gone.

insert overwrite table mytable

select  Col1
       ,Col2
       ,case when Col1 = 'Dipak' then 222 else Col3 end

from    mytable

Upvotes: 0

lucas kim
lucas kim

Reputation: 910

If you are using the old version of hive, you can only reload the data using the insert overwrite table statement.

If you want to use an update or delete statement, you can use hive 0.14 or later.

Upvotes: 0

Related Questions