Reputation: 25
I have Hive table partitioned based on date yyyy-mm-dd. I want to run a script everyday that can delete all the partitions older than 30 days but in this case I don't want to pass the date. I want it to take system date and delete the partitions older than 30 days.
Upvotes: 1
Views: 5481
Reputation: 1484
You need to use Linux/ Unix to set the variable for the DROP PARTITION date and use it in the ALTER TABLE statement.
Here is an example
CREATE TABLE ramesh.test
(col1 STRING,
col2 STRING)
PARTITIONED BY (partition_date date);
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-01') VALUES ('key1', 'val1');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-02') VALUES ('key2', 'val2');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-11-01') VALUES ('key3', 'val3');
hive --hivevar var_drop_date="$(date -d "30 days ago" +"%Y-%m-%d")" -e 'ALTER TABLE ramesh.test DROP IF EXISTS PARTITION (partition_date <= date "${hivevar:var_drop_date}")'
This drops the 2 partitions from 30 days ago and leaves the one within 30 days.
Let me know whether it works for you. I used CDH for testing this.
Upvotes: 6