Reputation: 1387
I run script X in Hive. I want to store a date that records when the script X last ran. Every time script X runs, it should pick up the last date it ran, increment it by seven days and use this value in its query, then update the last date ran to this new value. (Ideally this script X should be run N number of times until the date value reaches today's date. So if the date reaches 2014-02-16, it can stop because the next date is 2014-02-23 which is a future date.)
I'm planing to do this by manually creating a table that will store just the initial start date and keep getting updated-
create table IF NOT EXISTS date_last_run(last_run_date timestamp);
insert OVERWRITE TABLE date_last_run select '2014-05-01 00:00:00' AS last_run_day from date_last_run
-- I tried the above insert but it doesn't return an error and doesn't insert the data. How can insert a date in this table?
In the script I will write-
insert OVERWRITE TABLE date_last_run select last_run_day + integer '7' from date_last_run
--(Again I'm not sure if the above would work)
--Use last_run_day in the script
Is this an okay solution?
Upvotes: 0
Views: 410
Reputation: 2725
That might work, but a more robust solution would be to use Apache Oozie to handle the instances for you. You can include your Hive script in an Oozie workflow, and then schedule it to run weekly in an Oozie coordinator.
If you need to run past dates you can change the start
property of the coordinator to start in the past, and the instances will catch up to the current date automatically.
The date being run each time can be passed to the workflow by using a property
in the coordinator action
, and then passed from the workflow to the Hive script by using a param
in the Hive action. In the Hive script you can refer to the date parameter using ${parameter_name}
.
Upvotes: 1