Shakile
Shakile

Reputation: 343

Add partitions on existing hive table

I'm processing a big hive's table (more than 500 billion records). The processing is too slow and I would like to make it faster. I think that by adding partitions, the process could be more efficient.

Can anybody tell me how I can do that? Note that my table already exists.

My table :

create table T(
nom string,
prenom string,
...
date string)

Partitioning on date field.

Thx

Upvotes: 5

Views: 28475

Answers (2)

Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

You have to restructure the table. Here are the steps:

  1. Make sure no other process is writing to the table.
  2. Create new external table using partitioning
  3. Insert into new table by selecting from the old table
  4. Drop the new table (external), only table will be dropped but data will be there
  5. Drop the old table
  6. Create the table with original name by pointing to the location under step 2
  7. You can run repair command to fix all the metadata.

Alternative 4, 5, 6 and 7

  1. Create the table with original name by running show create table on new table and replace with original table name
  2. Run LOAD DATA INPATH command to move files under partitions to new partitions of new table
  3. Drop the external table created

Both the approaches will achieve restructuring with one insert/map reduce job.

Upvotes: 2

Matt
Matt

Reputation: 107

 SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

INSERT OVERWRITE TABLE table_name PARTITION(Date) select date from table_name; 

Note : In the insert statement for a partitioned table make sure that you are specifying the partition columns at the last in select clause.

Upvotes: 3

Related Questions