alejomarchan
alejomarchan

Reputation: 368

Overwrite multiple partitions at once Hadoop

I have a partitioned external table Hive that i have to overwrite with some records. There are a lot of dates that we need to reload and the queries are a bit heavy. What we want to know is if it is possible, in a simultaneous way load two or more different partitions at the same time? For example, 3 (or more) processes running in parallel like:

Process1

insert overwrite table_prod partition (data_date)
select * from table_old where data_date=20221110;

Process2

insert overwrite table_prod partition (data_date)
select * from table_old where data_date=20221111;

Process3

insert overwrite table_prod partition (data_date)
select * from table_old where data_date=20221112;

Upvotes: 1

Views: 717

Answers (1)

Koushik Roy
Koushik Roy

Reputation: 7387

Short answer is yes you can.

Real question is how - because you have to consider large volume of data.

Option 1 - yes, you can use shell script or some scheduler tool. But the query you're using is going to be slow. you can use static partitioning which is way faster.

insert overwrite table_prod partition (data_date=20221110) -- pls note i mentioned partition value.
select 
col1, col2... -- exclude data_date column from select list
from table_old where data_date=20221110;

Option 2 - You can also use dynamic partition scheme to load all the partitions at once. This is perf intensive operation but you dont have to create any shell script or any other process.

insert overwrite table_prod partition (data_date)
select * from table_old 

Upvotes: 1

Related Questions