Redshift - Group Table based on consecutive rows

Question

I am working right now with this table:

What I want to do is to clear up this table a little bit, grouping some consequent rows together.

Is there any form to achieve this kind of result?

The first table is already working fine, I just want to get rid of some rows to free some disk space.

Gordon Linoff · Accepted Answer

One method is to peak at the previous row to see when the value changes. Assuming that valid_to and valid_from are really dates:

select id, class, min(valid_to), max(valid_from)
from (select t.*,
             sum(case when prev_valid_to >= valid_from + interval '-1 day' then 0 else 1 end) over (partition by id order by valid_to rows between unbounded preceding and current row) as grp
      from (select t.*, 
                   lag(valid_to) over (partition by id, class order by valid_to) as prev_valid_to
            from t
           ) t
     ) t
group by id, class, grp;

If the are not dates, then this gets trickier. You could convert to dates. Or, you could use the difference of row_numbers:

select id, class, min(valid_from), max(valid_to)
from (select t.*,
             row_number() over (partition by id order by valid_from) as seqnum,
             row_number() over (partition by id, class order by valid_from) as seqnum_2
      from t
     ) t
group by id, class, (seqnum - seqnum_2)

Redshift - Group Table based on consecutive rows

Answers (1)

Related Questions