Someguywhocodes
Someguywhocodes

Reputation: 781

Hive - max (rather than last) date in quarter

I'm querying a table and only want to select the end of quarter dates, I've done so like this:

select
    yyyy_mm_dd,
    id
from
    t1
where
    yyyy_mm_dd = cast(date_add(trunc(add_months(yyyy_mm_dd,3-pmod(month(yyyy_mm_dd)-1,3)),'MM'),-1) as date) --last day of q

With daily rows, from 2020-01-01 until 2020-12-31, the above works fine. However, 2021 rows end up being omitted as the quarter is incomplete. How could I modify the where clause so I select the last day of each quarter and the max date in the current quarter?

Upvotes: 1

Views: 209

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270463

It is not clear if you have multiple rows on the end-of-quarter dates. It might be safer to take the max and use that:

select t1.*
from (select t1.*,
             max(yyyy_mm_dd) over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd)) as max_yyyy_mm_dd
      from t1
     ) t1
where yyyy_mm_dd = max_yyyy_mm_dd;

Note that this uses t1.* for the select. If you only wanted the maximum date, you can aggregate:

select id, max(yyyy_mm_dd)
from t1
group by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd);

Upvotes: 1

mck
mck

Reputation: 42392

You can assign a row number for each quarter in descending order of date, and filter the rows with row number equals 1 (last date in each quarter):

select yyyy_mm_dd, id
from
(select
    yyyy_mm_dd,
    id,
    row_number() over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd) order by yyyy_mm_dd desc) as rn
from
    t1
) t2
where rn = 1

Upvotes: 1

Related Questions