Reputation: 996
I have a table that is partitioned by date in the format yyyyMMdd. If I do a simple query like this:
SELECT COUNT(*) FROM MyTable WHERE Date >= '20140924'
then it will scan 3 days of data (today is the 26th). However I would like my query to always look at the last 3 days, so I write it like this
SELECT COUNT(*) FROM MyTable
WHERE date >= from_unixtime(unix_timestamp() - 259200, 'yyyyMMdd')
The problem is that now it scans every partition. Is there a way to get it to pre-calculate the part of the query after the greater than?
Upvotes: 2
Views: 169
Reputation: 32392
One workaround is to create a table with a single row where you can select your calculated time from.
CREATE TABLE dual (dummy STRING);
INSERT INTO TABLE dual SELECT count(*) FROM dual;
SELECT COUNT(*) FROM MyTable t1
JOIN (
SELECT from_unixtime(unix_timestamp() - 259200, 'yyyyMMdd') myDate
FROM dual LIMIT 1
) t2 ON (t2.myDate = t1.Date)
Upvotes: 0
Reputation: 3845
Unfortunately Hive doesn't support that. I have faced the similar problem in past and my hive table had partitions for last 2 years.
However one workaround you can do is that you can run this hive query inside a shell script where you will compute this date in another variable and use it as a variable in hive query. An example script would be:
#!/bin/bash
date=`date +"%Y%m%d" -d "-3 days"`
hive -e "select count(*) from MyTable where date >= '$date'"
Upvotes: 1