Reputation: 469
I am new in Hive and appreciate if someone can help with a hive query that I am dealing with.
There are two tables A and B with exactly same schema but different datas with 4 partitions. I need to combine these two tables into one table with (4 + 1 = 5) partitions. The added partitions tells which tables the data is coming from. For example, let's say, the new partition is named as "source". If the data is coming from table A the source will be equal to "from_A" and if the data is coming from table B the source will be equal to "from_B".
hive> desc A;
OK
col1 string,
col2 string,
DD string,
EE string,
FF string,
GG string
# Partition Information
# col_name data_type
DD string
EE string
FF string
GG string
and
hive> desc B;
OK
col1 string,
col2 string,
DD string,
EE string,
FF string,
GG string
# Partition Information
# col_name data_type comment
DD string
EE string
FF string
GG string
Upvotes: 1
Views: 384
Reputation: 38325
Create new partitioned table
Create table C (
col1 string,
col2 string
)
partitioned by (
source string,
DD string,
EE string,
FF string,
GG string
);
Then load data in to new table:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table C partition(source,DD,EE,FF,GG)
select col1, col2,
--partitions
'from_A' source, DD, EE, FF, GG
from A
distribute by DD, EE, FF, GG;
And in parallel load data from table B:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table C partition(source,DD,EE,FF,GG)
select col1, col2,
--partitions
'from_B' source, DD, EE, FF, GG
from B
distribute by DD, EE, FF, GG;
Upvotes: 1