Alan
Alan

Reputation: 469

Combine two partitioned tables into one table but into two different partitions

I am new in Hive and appreciate if someone can help with a hive query that I am dealing with.

There are two tables A and B with exactly same schema but different datas with 4 partitions. I need to combine these two tables into one table with (4 + 1 = 5) partitions. The added partitions tells which tables the data is coming from. For example, let's say, the new partition is named as "source". If the data is coming from table A the source will be equal to "from_A" and if the data is coming from table B the source will be equal to "from_B".

hive> desc A;
OK
col1 string,
col2 string,
DD   string,                                    
EE   string,                                    
FF   string,                                    
GG   string 

# Partition Information      
# col_name              data_type               

DD              string                                      
EE              string                                      
FF                  string                                      
GG              string 

and

hive> desc B;
OK
col1 string,
col2 string,
DD   string,                                    
EE   string,                                    
FF   string,                                    
GG   string

# Partition Information      
# col_name              data_type               comment             

DD              string                                      
EE              string                                      
FF                  string                                      
GG              string

Upvotes: 1

Views: 384

Answers (1)

leftjoin
leftjoin

Reputation: 38325

Create new partitioned table

Create table C (
col1 string,
col2 string
)
partitioned by (
source string,
DD   string,                                    
EE   string,                                    
FF   string,                                    
GG   string
);

Then load data in to new table:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table C partition(source,DD,EE,FF,GG)
select col1, col2, 
       --partitions
      'from_A' source, DD, EE, FF, GG 
  from A
distribute by DD, EE, FF, GG;

And in parallel load data from table B:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table C partition(source,DD,EE,FF,GG)
select col1, col2, 
      --partitions
      'from_B' source, DD, EE, FF, GG 
 from B
distribute by DD, EE, FF, GG;

Upvotes: 1

Related Questions