user2998990
user2998990

Reputation: 980

Insert data of 2 Hive external tables in new External table with additional column

I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.

create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';

create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';

Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table. The new external table is as follows.

create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';

how can I do it.?

Upvotes: 0

Views: 1697

Answers (3)

Farooque
Farooque

Reputation: 3766

You do SELECT from each table and perform UNION ALL operation on these results and finally insert the result into your third table.

Below is the final hive query:

INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;

Hope this help you!!!

Upvotes: 1

ZeusNet
ZeusNet

Reputation: 720

You could use the INSERT INTO Clause of Hive

INSERT INTO TABLE table transaction_usa_canada 
SELECT tran_id, acct_id, tran_date, ...'transaction_usa' FROM transaction_usa;

INSERT INTO TABLE table transaction_usa_canada 
SELECT tran_id, acct_id, tran_date, ...'transaction_canada' FROM transaction_canada;

Upvotes: 0

shankarsh15
shankarsh15

Reputation: 1967

You can very well do it by manual partitioning as well.

CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)

Then run below command,

load data inpath 'hdfspath' into table transaction_new_table   partition(sourcetablename='1')

Upvotes: 0

Related Questions