formicaman
formicaman

Reputation: 1357

Only query last n partitions

I have a table (table1) that is partitioned by date. It could look something like this:

╔════════════╦═══════╦══════╗
║    date    ║ col1  ║ col2 ║
╠════════════╬═══════╬══════╣
║ 2020-06-16 ║ Jeff  ║ 3    ║
║ 2020-06-16 ║ Bob   ║ 148  ║
║ 2020-06-16 ║ Karen ║ 101  ║
║ 2020-06-15 ║ Joy   ║ 9    ║
║ 2020-06-15 ║ Joe   ║ 33   ║
║ 2020-06-12 ║ Kevin ║ 48   ║
║ 2020-06-12 ║ Raj   ║ 100  ║
║ 2020-06-12 ║ Karen ║ 9    ║
║ ...        ║ ...   ║ ...  ║
╚════════════╩═══════╩══════╝

I want to run the query SELECT count(*) FROM table1 group by col1 HAVING count(*) > 1 but let's say I only want to run this on the last two partitions (dates 2020-06-16 and 2020-06-15). Is there a straightforward way to do this in Hive? In other words I wanted to only check to top n partitions at a given time.

Upvotes: 0

Views: 576

Answers (2)

Chananel P
Chananel P

Reputation: 1814

Try to use subquery to get the two latest dates with order by and limit 2:

SELECT count(*) 
FROM table1 
WHERE date IN (SELECT DISTINCT date FROM table1 ORDER BY date DESC LIMIT 2)
GROUP BY col1 
HAVING count(*) > 1

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270993

I'm not sure if this solves your problem, but you can reference the current date:

where date >= date_sub(current_date, -2)

Upvotes: 0

Related Questions