Reputation: 343
I'm processing a big hive's table (more than 500 billion records). The processing is too slow and I would like to make it faster. I think that by adding partitions, the process could be more efficient.
Can anybody tell me how I can do that? Note that my table already exists.
My table :
create table T(
nom string,
prenom string,
...
date string)
Partitioning on date field.
Thx
Upvotes: 5
Views: 28475
Reputation: 3956
You have to restructure the table. Here are the steps:
Alternative 4, 5, 6 and 7
show create table
on new table and replace with original table nameLOAD DATA INPATH
command to move files under partitions to new partitions of new tableBoth the approaches will achieve restructuring with one insert/map reduce job.
Upvotes: 2
Reputation: 107
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;
Note : In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.
Upvotes: 3