Reputation: 39
Lets say I have a table with the below data:
Col1 Col2 Col3
Dipak More 123
Sachin Patil 123
Pradip More 999
I want to update the Dipak
with 222
values in Col3
So as hive won't allow you to update the files.So how we can achieve in hive
Upvotes: 1
Views: 1009
Reputation: 111
Here is my solution/work around if you are using old hive version. This works better when you have large data in target table which we can't drop and recreate with full data every time.
create one more table say delete_keys table. This will hold all the key from main table which are deleted along with its surrogate key.
While loading incremental data into main table, do a left join with main table. For all the matching records, we ideally should update the main table. But instead, we take keys (along with surrogate key) from main table for all matching records and insert that to delete_keys table. Now we can insert all delta records into main table as it irrespective of whether they are to be updated or inserted.
Create view on main table using delete-keys table so that matching keys with delete-keys table are not fetched. So, this view will be final target table. This view will not show records from main table which are updated with latest records.
Upvotes: 0
Reputation: 45
Here table t2 is an updated table where records get an update and table t1 holds history records.
Below hive query will insert only updated records not already existing records
hive> select * from t1;
OK
10 aaa ny
20 bbb ny
Time taken: 0.02 seconds, Fetched: 2 row(s)
hive> select * from t2;
OK
20 bbb ny
10 aaa SFO . -- updated record
Time taken: 0.023 seconds, Fetched: 2 row(s)
insert into table t1 select c.eid as eid,c.ename as ename,c.loc as loc from t2 c where c.eid in (select a.eid from t1 a left outer join t2 b on a.eid=b.eid and a.loc= b.loc where b.loc is null);
hive> select * from t1;
OK
10 aaa ny
20 bbb ny
10 aaa SFO -- updated in t1 table
Time taken: 0.015 seconds, Fetched: 3 row(s)
hive>
Upvotes: 1
Reputation: 44921
In any case you'll have to rebuild the whole data set.
I would suggest using CTAS (Create Table As Select) to create a table with the requested data and then renaming the tables.
In this way you'll have the ability to rollback if something go wrong.
create table mytable_tmp
as
select Col1
,Col2
,case when Col1 = 'Dipak' then 222 else Col3 end
from mytable
;
alter table mytable rename to mytable_bck_20170311
;
alter table mytable_tmp rename to mytable
;
You can do it in one move, but bear in mind that if something goes wrong, E.g. typo in your query - your data is gone.
insert overwrite table mytable
select Col1
,Col2
,case when Col1 = 'Dipak' then 222 else Col3 end
from mytable
Upvotes: 0
Reputation: 910
If you are using the old version of hive, you can only reload the data using the insert overwrite table statement.
If you want to use an update or delete statement, you can use hive 0.14 or later.
Upvotes: 0