AS0207
AS0207

Reputation: 205

HIVE SQL Collapsing continuous ranges into single rows

Consider the following records in a table:

NAME    ID      RATE   LOC   DAY
ABCD    123      -5    NYC    2017-01-01
ABCD    123      -5    NYC    2017-01-02
ABCD    123      -6    SFO    2017-01-03
ABCD    123      -6    DEN    2017-01-04
ABCD    345      -4    ATL    2017-01-05
ABCD    345      -4    WAS    2017-01-06
ABCD    123      -7    CLT    2017-01-07
ABCD    123      -7    CLT    2017-01-08

I would like to have an output like:

NAME    ID      RATE  LOC   START DAY   END DAY
ABCD    123      -5   NYC   2017-01-01  2017-01-02
ABCD    123      -6   SFO   2017-01-03  2017-01-03
ABCD    123      -6   DEN   2017-01-04  2017-01-04
ABCD    345      -4   ATL   2017-01-05  2017-01-05
ABCD    345      -4   WAS   2017-01-06  2017-01-06
ABCD    123      -7   CLT   2017-01-07  2017-01-08

How do I do this in SQL or HIVE? I tried using max over partition and row_number. It doesn't seem to work. Any ideas are greatly appreciated.

This is the sql I tried:

select *     
  from (
        select name
              ,id
              ,min(day) over (partition by name
                                          ,id) as start_date
              ,max(day) over (partition by name
                                          ,id) as end_date                     
              ,row_number () over (partition by name
                                               ,id
                                 order by day asc) as row1
          from table
       ) a
where row1=1;

Upvotes: 1

Views: 400

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

This can be achieved with a difference of row numbers approach. To understand how it works, run the inner query and look at the results.

select name,id,min(day),max(day)
from (select t.*,
      row_number() over(order by day)
      -row_number() over(partition by name,id order by day) as grp
      from tbl t
     ) t
group by name,id,grp

Upvotes: 1

Related Questions