Reputation: 145775
I've got a table like this:
id time activity
1: 1 1 a
2: 1 2 a
3: 1 3 b
4: 1 4 b
5: 1 5 a
6: 2 1 a
7: 2 2 b
8: 2 3 b
9: 2 4 b
10: 2 5 a
11: 2 6 a
12: 2 7 c
13: 2 8 c
14: 2 9 c
Within each id
, I would like to aggregate by contiguous blocks of activity
. So essentially I want a grouping
column like this:
id time activity grouping
1: 1 1 a 1
2: 1 2 a 1
3: 1 3 b 2
4: 1 4 b 2
5: 1 5 a 3
6: 2 1 a 1
7: 2 2 b 2
8: 2 3 b 2
9: 2 4 b 2
10: 2 5 a 3
11: 2 6 a 3
12: 2 7 c 4
13: 2 8 c 4
14: 2 9 c 4
So that I can use aggregate functions and get something like this:
select id
, min(time) as min_time
, max(time) as max_time
, count(*) as n_activity
from A
group by id, grouping
id min_time max_time n_activity
1: 1 1 2 2
2: 1 3 4 2
3: 1 5 5 1
4: 2 1 1 1
5: 2 2 4 3
6: 2 5 6 2
7: 2 7 9 3
How can I create the grouping column? My table is quite large, so I would prefer to avoid cursor functions if possible.
Some sample data:
create table A (id int, time int, activity varchar);
insert into A (id, time, activity)
values
(1,1,'a'),(1,2,'a'),(1,3,'b'),(1,4,'b'),(1,5,'a'),(2,1,'a'),
(2,2,'b'),(2,3,'b'),(2,4,'b'),(2,5,'a'),(2,6,'a'),(2,7,'c'),
(2,8,'c'),(2,9,'c')
Upvotes: 2
Views: 293
Reputation: 86716
Should be able to just use the time
value and a secondary sequence of numbers from ROW_NUMBER()
?
SELECT
*,
time - ROW_NUMBER() OVER (PARTITION BY id, activity
ORDER BY time ) AS rownum
FROM
yourTable
The fields (id,activity,rownum)
the gives a composite key for your groups.
You can then wrap DENSE_RANK() OVER (PARTITION BY id ORDER BY rownum, activity DESC)
aroudn the whole thing, if you really need it as a single field identifier.
id time activity rownum (time-rownum) (composite key) (dense_rank)
1: 1 1 a 1 0 (1,a,0) 1
2: 1 2 a 2 0 (1,a,0) 1
3: 1 3 b 1 2 (1,b,2) 2
4: 1 4 b 2 2 (1,b,2) 2
5: 1 5 a 3 2 (1,a,2) 3
6: 2 1 a 1 0 (2,a,0) 1
7: 2 2 b 1 1 (2,b,1) 2
8: 2 3 b 2 1 (2,b,1) 2
9: 2 4 b 3 1 (2,b,1) 2
10: 2 5 a 2 3 (2,a,3) 3
11: 2 6 a 3 3 (2,a,3) 3
12: 2 7 c 1 6 (2,c,6) 4
13: 2 8 c 2 6 (2,c,6) 4
14: 2 9 c 3 6 (2,c,6) 4
Applying the composite key to your aggregation example...
SELECT
id
, min(time) as min_time
, max(time) as max_time
, count(*) as n_activity
FROM
(
SELECT
*,
time - ROW_NUMBER() OVER (PARTITION BY id, activity
ORDER BY time ) AS rownum
FROM
yourTable
)
partitioned
GROUP BY
id, activity, rownum
If time is sorted, but not always contiguous, that becomes...
SELECT
id
, min(time) as min_time
, max(time) as max_time
, count(*) as n_activity
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY time )
-
ROW_NUMBER() OVER (PARTITION BY id, activity
ORDER BY time ) AS rownum
FROM
yourTable
)
partitioned
GROUP BY
id, activity, rownum
Upvotes: 1
Reputation: 49260
Use lag
to check if the previous row has the same activity as current, if it isn't reset it with a running sum.
select t.*,sum(case when prev_activity=activity then 0 else 1 end) over(partition by id order by time) as grp
from (
select t.*,lag(activity) over(partition by id order by time) as prev_activity
from tbl t
) t
Upvotes: 3