Group contiguous blocks for aggregation in SQL (Redshift)

Question

I've got a table like this:

    id time activity
 1:  1    1        a
 2:  1    2        a
 3:  1    3        b
 4:  1    4        b
 5:  1    5        a
 6:  2    1        a
 7:  2    2        b
 8:  2    3        b
 9:  2    4        b
10:  2    5        a
11:  2    6        a
12:  2    7        c
13:  2    8        c
14:  2    9        c

Within each id, I would like to aggregate by contiguous blocks of activity. So essentially I want a grouping column like this:

    id time activity grouping
 1:  1    1        a        1
 2:  1    2        a        1
 3:  1    3        b        2
 4:  1    4        b        2
 5:  1    5        a        3
 6:  2    1        a        1
 7:  2    2        b        2
 8:  2    3        b        2
 9:  2    4        b        2
10:  2    5        a        3
11:  2    6        a        3
12:  2    7        c        4
13:  2    8        c        4
14:  2    9        c        4

So that I can use aggregate functions and get something like this:

select id
, min(time) as min_time
, max(time) as max_time
, count(*) as n_activity
from A
group by id, grouping

   id min_time max_time n_activity
1:  1        1        2          2
2:  1        3        4          2
3:  1        5        5          1
4:  2        1        1          1
5:  2        2        4          3
6:  2        5        6          2
7:  2        7        9          3

How can I create the grouping column? My table is quite large, so I would prefer to avoid cursor functions if possible.

Some sample data:

create table A (id int, time int, activity varchar);
insert into A (id, time, activity)
values
(1,1,'a'),(1,2,'a'),(1,3,'b'),(1,4,'b'),(1,5,'a'),(2,1,'a'),
(2,2,'b'),(2,3,'b'),(2,4,'b'),(2,5,'a'),(2,6,'a'),(2,7,'c'),
(2,8,'c'),(2,9,'c')

Vamsi Prabhala · Accepted Answer

Use lag to check if the previous row has the same activity as current, if it isn't reset it with a running sum.

select t.*,sum(case when prev_activity=activity then 0 else 1 end) over(partition by id order by time) as grp
from (
select t.*,lag(activity) over(partition by id order by time) as prev_activity
from tbl t
) t

Group contiguous blocks for aggregation in SQL (Redshift)

Answers (2)

Related Questions