Reputation: 1357
I have a table (table1
) that is partitioned by date. It could look something like this:
╔════════════╦═══════╦══════╗
║ date ║ col1 ║ col2 ║
╠════════════╬═══════╬══════╣
║ 2020-06-16 ║ Jeff ║ 3 ║
║ 2020-06-16 ║ Bob ║ 148 ║
║ 2020-06-16 ║ Karen ║ 101 ║
║ 2020-06-15 ║ Joy ║ 9 ║
║ 2020-06-15 ║ Joe ║ 33 ║
║ 2020-06-12 ║ Kevin ║ 48 ║
║ 2020-06-12 ║ Raj ║ 100 ║
║ 2020-06-12 ║ Karen ║ 9 ║
║ ... ║ ... ║ ... ║
╚════════════╩═══════╩══════╝
I want to run the query SELECT count(*) FROM table1 group by col1 HAVING count(*) > 1
but let's say I only want to run this on the last two partitions (dates 2020-06-16
and 2020-06-15
). Is there a straightforward way to do this in Hive? In other words I wanted to only check to top n partitions at a given time.
Upvotes: 0
Views: 576
Reputation: 1814
Try to use subquery to get the two latest dates with order by
and limit 2
:
SELECT count(*)
FROM table1
WHERE date IN (SELECT DISTINCT date FROM table1 ORDER BY date DESC LIMIT 2)
GROUP BY col1
HAVING count(*) > 1
Upvotes: 2
Reputation: 1270993
I'm not sure if this solves your problem, but you can reference the current date:
where date >= date_sub(current_date, -2)
Upvotes: 0