Count Zero
Count Zero

Reputation: 337

Aggregate Postgresql rows into groups based on interval

I have a result set from a CTE that returns the interval between each row insertion like follows:

row interval
1 years 0 mons 0 days 1 hours 0 mins 16.0 secs
2 0 years 0 mons 0 days 0 hours 45 mins 42.0 se
3 0 years 0 mons 0 days 0 hours 0 mins 20.0 sec
4 0 years 0 mons 0 days 5 hours 4 mins 19.0 sec
5 0 years 0 mons 0 days 0 hours 2 mins 32.0 sec
6 0 years 0 mons 0 days 0 hours 0 mins 25.0 sec
7 0 years 0 mons 0 days 0 hours 1 mins 9.0 secs
8 0 years 0 mons 0 days 0 hours 0 mins 25.0 sec
9 0 years 0 mons 0 days 0 hours 0 mins 16.0 sec
10 0 years 0 mons 0 days 2 hours 50 mins 24.0 s
11 0 years 0 mons 0 days 1 hours 6 mins 49.0 se
12 0 years 0 mons 0 days 0 hours 4 mins 6.0 sec
13 0 years 0 mons 0 days 0 hours 1 mins 6.0 sec
14 0 years 0 mons 0 days 0 hours 5 mins 48.0 se
15 0 years 0 mons 0 days 0 hours 3 mins 42.0 se
16 0 years 0 mons 0 days 0 hours 0 mins 22.0 se
17 0 years 0 mons 0 days 0 hours 0 mins 30.0 se
18 0 years 0 mons 0 days 0 hours 0 mins 19.0 se
19 0 years 0 mons 0 days 0 hours 0 mins 16.0 se
20 0 years 0 mons 0 days 0 hours 1 mins 55.0 se

What I need to extract is an aggregation of all rows after a cut off value to be grouped together until another cut off initiates a new grouping. For example, say the interval cut off is 1 hour, I want (starting with row 1) rows 1-3 to be grouped together because row 1 is above the cut off and would be the beginning a new grouping. 2&3 would be included the group as well because they are below the cut off. At row 4 a new group would be created because 4 is above the cut off. Then all subsequent rows would be included up to 9. 10 would be its own group because 11 is also above the cutoff and so on. Thanks very much for any assistance.

insert into tmp (x, delta)
values  (1, '0 years 0 mons 0 days 1 hours 0 mins 16.0 secs'),
        (2, '0 years 0 mons 0 days 0 hours 45 mins 42.0 secs'),
        (3, '0 years 0 mons 0 days 0 hours 0 mins 20.0 secs'),
        (4, '0 years 0 mons 0 days 0 hours 4 mins 19.0 secs'),
        (5, '0 years 0 mons 0 days 0 hours 2 mins 32.0 secs'),
        (6, '0 years 0 mons 0 days 0 hours 0 mins 25.0 secs'),
        (7, '0 years 0 mons 0 days 0 hours 1 mins 9.0 secs'),
        (8, '0 years 0 mons 0 days 0 hours 0 mins 25.0 secs'),
        (9, '0 years 0 mons 0 days 0 hours 0 mins 16.0 secs'),
        (10, '0 years 0 mons 0 days 2 hours 50 mins 24.0 secs'),
        (11, '0 years 0 mons 0 days 1 hours 6 mins 49.0 secs'),
        (12, '0 years 0 mons 0 days 0 hours 4 mins 6.0 secs'),
        (13, '0 years 0 mons 0 days 0 hours 1 mins 6.0 secs'),
        (14, '0 years 0 mons 0 days 0 hours 5 mins 48.0 secs'),
        (15, '0 years 0 mons 0 days 0 hours 3 mins 42.0 secs'),
        (16, '0 years 0 mons 0 days 0 hours 0 mins 22.0 secs'),
        (17, '0 years 0 mons 0 days 0 hours 0 mins 30.0 secs'),
        (18, '0 years 0 mons 0 days 0 hours 0 mins 19.0 secs'),
        (19, '0 years 0 mons 0 days 0 hours 0 mins 16.0 secs'),
        (20, '0 years 0 mons 0 days 0 hours 1 mins 55.0 secs');

Upvotes: 0

Views: 226

Answers (1)

Blue Star
Blue Star

Reputation: 1952

It's unclear to me from your question if 10 consecutive rows with a delta of 10 minutes each should be lumped into one group or two. If you want them all to be one group, you can do something like this:

WITH ranges AS (
    SELECT x as start, lead(x) over (order by x) as end 
    FROM tmp 
    WHERE delta > interval '1 hour'
) 
SELECT ranges.start, string_agg(x::text, ', ') 
FROM tmp, ranges 
WHERE tmp.x >= ranges.start 
AND (ranges.end is null OR tmp.x < ranges.end)
GROUP BY ranges.start, ranges.end
ORDER BY ranges.start;

 start |               string_agg
-------+----------------------------------------
     1 | 1, 2, 3, 4, 5, 6, 7, 8, 9
    10 | 10
    11 | 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

First we identify the rows with a delta greater than the cutoff value, which represent the cutoff points, and we use a window function to determine the range. Then it's just a matter of aggregating over those ranges.

However, if you want the groups to be no more than one hour long, then that's trickier. I'm not sure if it can be done without a loop.

Upvotes: 1

Related Questions