Reputation: 547
I run a particular query every week that creates a weekly table of the required data.
The table names are in the format as mentioned below
Since this is a repetitive task I want to schedule this query so that it will automatically run on every monday of a particular week to get previous week's data.
The problem that I am facing is, I dont know how do I change the my table name dynamically as I run my query every week.
So when I run my query next time it should automatically create a table with name db_name.subscriptions_wk32
. I can get value 32 from weekofyear('2019-08-05')
but dont know how to put it in table name
Currently I write it as below
CREATE TABLE db_name.subscriptions_wk30 AS -- a hardcoded name
SELECT *
FROM ..........
What I want is
CREATE TABLE db_name.subscriptions_wkCAST(weekofyear('2019-08-05') AS varchar) -- a dynamic name
SELECT *
FROM ..........
Which will result into
CREATE TABLE db_name.subscriptions_wk32 SELECT * FROM ..........
PS I am using Hive/Hue as RDBMS
Upvotes: 2
Views: 1217
Reputation: 1270391
Don't do this! Having multiple parallel tables with the same structure is a really bad idea.
Instead, have a single table db_name.subscriptions
and add a column that specifies the week -- perhaps the first Monday or last Sunday of the week.
Then, instead of creating separate tables, just insert rows for each week.
You will find advantages to having a single table:
where
clause (which can be parameterized) rather than changing the from
clause (which cannot be).And making the weeks partitions in the same table is very useful, if each week produces a non-trivial number of rows.
Upvotes: 2
Reputation: 4678
You can use below shell to create a dynamic weekly table and can schedule it to run on every Monday using oozie scheduler or as a cron job.
#!/bin/bash
echo "Executing the hive query - get current week and store it in shell variable"
#current_week=$(hive -e "select weekofyear(current_date);")
#echo $current_week
previous_week=$(hive -e "select weekofyear(date_sub(current_date, 7));")
echo $previous_week
hive --hiveconf dbname=test_dev_db --hiveconf weekname=$previous_week -f hdfs://xxx.host.com:8020/user/xxx/dev/hadoop/hivescripts/createweektable.hql
echo "Executing the hive query - ends"
hive (test_dev_db)> desc test_dev_db.subscriptions_wk31;
OK
user_id int
country string
last_modified_date date
Time taken: 0.345 seconds, Fetched: 3 row(s)
Update- This is how you can refer your shell variable in your hql script.
CREATE TABLE ${hiveconf:dbname}.subscriptions_wk${hiveconf:weekname}
row format delimited
fields terminated by '|'
STORED AS ORC
AS select * from test_dev_db.test_data;
Upvotes: 2
Reputation: 38335
It is not possible to calculate table name in the query itself, but it is possible to pass parameter to the script. You can calculate parameter in a shell and execute script from shell:
#You can provide date:
varDate=2019-08-05
#And calculate weekyear
weeknumber=$(date --date=${varDate} +%V)
echo "${weeknumber}"
#returns 32
#Or calculate current date weekyear
weeknumber=`date +%V`
#Or calculate previous week date
weeknumber="$(date -d "7 days ago" +"%V")"
#And call hive script like this:
hive -e "CREATE TABLE db_name.subscriptions_wk${weeknumber} -- parametrized name suffix
SELECT *
FROM ...
"
Or you can use -hivevar
parameter in the hive command line to call script file (-f
option), suppose weeknumber
is already calculated like before:
hive -hivevar weeknumber="$weeknumber" -f script_file_name
Upvotes: 3