Ram
Ram

Reputation: 63

How to update some columns of a table in Hive with the data of other table

I want to update data for some columns in one table from another table.

For these three columns cf_mng,cf_sds,cf_htg in the table cust_tabl there is no data.

I want to update data for these three columns cf_mng,cf_sds,cf_htg of cust_tabl with the data of the three columns cust_cd_cnt_1,cust_cd_cnt_2,cust_cd_cnt_3 of custom_hist table.

This table is having data from 201505 to 201509.

CREATE TABLE custom_hist( 
 cust_no varchar(20),    
 cust_cd_cnt_1 float,  
 cust_cd_cnt_2 float,  
 cust_cd_cnt_3 float,  
 cust_dt date,
 cust_name string) 
 PARTITIONED BY (yyyymm int);

This table is having data from 201403 to 201606.

CREATE TABLE cust_tabl(
cust_no string,  
cf_mng double,  
cf_sds double,  
cf_htg double,  
cust_loc string,  
cust_region string,  
cust_country string,
cust_reg_id smallint)
PARTITIONED BY (yyyymm int); 

Please help me.

Upvotes: 2

Views: 1821

Answers (1)

leftjoin
leftjoin

Reputation: 38290

Join your tables by primary key and overwrite joined partitions. Check the primary key. The join cardinality should be 1:1 or 1:0 otherwise you should apply some row_number or rank or some aggregation like max() to limit rows after join:

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

insert overwrite table cust_tabl partition (yyyymm)
select 
      c.cust_no,
      coalesce(h.cust_cd_cnt_1,c.cf_mng) as cf_mng, --take history column if joined
      coalesce(h.cust_cd_cnt_2,c.cf_sds) as cf_sds, --take original if not joined
      coalesce(h.cust_cd_cnt_3,c.cf_htg) as cf_htg,
      c.cust_loc,  --original columns
      c.cust_region,
      c.cust_country,
      c.cust_reg_id,
      c.yyyymm     --partition is the last
  from cust_tabl c
       left join custom_hist h 
                 --assume this is the primary key:
                 on c.cust_no = h.cust_no and c.yyyymm = h.yyyymm;

Upvotes: 2

Related Questions