Reputation: 205
Consider the following records in a table:
NAME ID RATE LOC DAY
ABCD 123 -5 NYC 2017-01-01
ABCD 123 -5 NYC 2017-01-02
ABCD 123 -6 SFO 2017-01-03
ABCD 123 -6 DEN 2017-01-04
ABCD 345 -4 ATL 2017-01-05
ABCD 345 -4 WAS 2017-01-06
ABCD 123 -7 CLT 2017-01-07
ABCD 123 -7 CLT 2017-01-08
I would like to have an output like:
NAME ID RATE LOC START DAY END DAY
ABCD 123 -5 NYC 2017-01-01 2017-01-02
ABCD 123 -6 SFO 2017-01-03 2017-01-03
ABCD 123 -6 DEN 2017-01-04 2017-01-04
ABCD 345 -4 ATL 2017-01-05 2017-01-05
ABCD 345 -4 WAS 2017-01-06 2017-01-06
ABCD 123 -7 CLT 2017-01-07 2017-01-08
How do I do this in SQL or HIVE? I tried using max over partition and row_number. It doesn't seem to work. Any ideas are greatly appreciated.
This is the sql I tried:
select *
from (
select name
,id
,min(day) over (partition by name
,id) as start_date
,max(day) over (partition by name
,id) as end_date
,row_number () over (partition by name
,id
order by day asc) as row1
from table
) a
where row1=1;
Upvotes: 1
Views: 400
Reputation: 49260
This can be achieved with a difference of row numbers approach. To understand how it works, run the inner query and look at the results.
select name,id,min(day),max(day)
from (select t.*,
row_number() over(order by day)
-row_number() over(partition by name,id order by day) as grp
from tbl t
) t
group by name,id,grp
Upvotes: 1