Reputation: 343

Add partitions on existing hive table

I'm processing a big hive's table (more than 500 billion records). The processing is too slow and I would like to make it faster. I think that by adding partitions, the process could be more efficient.

Can anybody tell me how I can do that? Note that my table already exists.

My table :

create table T(
nom string,
prenom string,
...
date string)

Partitioning on date field.

Thx

Upvotes: 5

Answers (2)

Durga Viswanath Gadiraju

Reputation: 3956

You have to restructure the table. Here are the steps:

Make sure no other process is writing to the table.
Create new external table using partitioning
Insert into new table by selecting from the old table
Drop the new table (external), only table will be dropped but data will be there
Drop the old table
Create the table with original name by pointing to the location under step 2
You can run repair command to fix all the metadata.

Alternative 4, 5, 6 and 7

Create the table with original name by running show create table on new table and replace with original table name
Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
Drop the external table created

Both the approaches will achieve restructuring with one insert/map reduce job.

Upvotes: 2

Matt

Reputation: 107

 SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name;

Note : In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.

Upvotes: 3

Add partitions on existing hive table

Answers (2)

Related Questions