iskandarblue
iskandarblue

Reputation: 7526

Min and max of grouped time sequences in SQL

I have a large Postgres table test from which I would like to extract consecutive sequences of no_signal states per mobile_id, or in other words the length of time individual mobile devices go out of service.

In the real table, records are not ordered, which I think would mean a PARTITION OVER (time, mobile_id) statement would have to be included, in addition to a window function. Any advice on how to create a group for individual consecutive sequences, and then to take the min and max per group would be appreciated.

-- CREATE TABLE test (mobile_id int, state varchar, time timestamp, region varchar)

INSERT INTO test (mobile_id, state, time, region ) VALUES
(1, 'active', TIMESTAMP '2018-08-09 15:00:00', 'EU'),  
(1, 'active', TIMESTAMP '2018-08-09 16:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 17:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 18:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 19:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 20:00:00', 'EU'),
(1, 'inactive', TIMESTAMP '2018-08-09 21:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 22:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 23:00:00', 'EU'),
(2, 'active', TIMESTAMP '2018-08-10 00:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 01:00:00', 'EU'),
(2, 'active', TIMESTAMP '2018-08-10 02:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 03:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 04:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 05:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 06:00:00', 'EU'),
(3, 'active', TIMESTAMP '2018-08-10 07:00:00', 'SA'),
(3, 'active', TIMESTAMP '2018-08-10 08:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 09:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 10:00:00', 'SA'),
(3, 'inactive', TIMESTAMP '2018-08-10 11:00:00', 'SA'),
(3, 'inactive', TIMESTAMP '2018-08-10 12:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 13:00:00', 'SA')

The output that I am aiming for would be something like this:

 mobile_id          start_time            end_time diff_time region
         1 2018-08-09 17:00:00 2018-08-09 19:00:00       120     EU
         2 2018-08-10 01:00:00 2018-08-10 01:00:00         0     EU
         2 2018-08-10 03:00:00 2018-08-10 06:00:00       180     EU
         3 2018-08-10 09:00:00 2018-08-10 10:00:00        60     SA
         3 2018-08-10 13:00:00 2018-08-10 13:00:00         0     SA

The following code does not produce the desired result as groups are not correctly created:

select mobile_id, region,
       least(extract(epoch from max(time) - min(time)), 0) as diff
from (select t.*,
             count(*) filter (where state = 'no_signal) over (partition by mobile_id, region order by time) as grp
      from t
     ) t
group by mobile_id, region, grp;

Upvotes: 3

Views: 971

Answers (2)

S-Man
S-Man

Reputation: 23766

demo: db<>fiddle

SELECT DISTINCT
    mobile_id,
    first_value(time) over (partition by ranked, time) as start_time,        -- B
    first_value(time) over (partition by ranked, time desc) as end_time, 
    region
FROM
(
    SELECT *, SUM(is_diff) OVER (ORDER BY time) as ranked                          -- A
    FROM
    (
        SELECT *,
            CASE WHEN state = lag(state) over (order by time) THEN 0 ELSE 1 END as is_diff
        FROM test 
    ) s
) s
WHERE
    state = 'no_signal';

A: The problem is that you are trying to order a column and then you want to do a partition for another. This problem can be solved by this subquery. The problem is discussed here. I am searching for a better solution but this subquery works. This creates a column which can be used for your wanted window.

B: After creating the window your start_time and end_time can be calculated easily by using the first_value(time) and first_value(time) ... ORDER BY time DESC function. DESC because it sorts the window with the latest time and then you can get the first value of that (last_value() does not work as expected every time).


For more clearness to the real problem I left out the diff calculation above: To add the diff you only need to do a subquery:

SELECT 
    *,  
    EXTRACT(epoch from (end_time - start_time)) / 60 as diff
FROM (
    -- <QUERY ABOVE>
) s

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

This is a variant of the gaps and island problem. In this case, you are trying to detect multiple islands of records having no_signal for each mobile number.

This answer uses the "difference in row number method." The trick has to do with applying ROW_NUMBER over your table in two ways. The first one generates the sequence over all records, ordered by time, while the second generates the sequence for each mobile_id group, and then only for those records whose status is no_signal. The difference in these row number values can be used to form each of the islands. Then, we only need to aggregate and take the min/max timestamp value to get the result you want.

WITH cte1 AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY time) rn1
    FROM test
),
cte2 AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY mobile_id ORDER BY time) rn2
    FROM test
    WHERE state = 'no_signal'
),
cte3 AS (
    SELECT t1.*, t2.rn2
    FROM cte1 t1
    LEFT JOIN cte2 t2
        ON t1.mobile_id = t2.mobile_id AND t1.time = t2.time
    WHERE t1.state = 'no_signal'
)

SELECT
    mobile_id,
    MIN(time) AS start_time,
    MAX(time) AS end_time,
    EXTRACT(epoch FROM MAX(time::timestamp) - MIN(time::timestamp)) / 60 diff_time,
    region
FROM cte3
GROUP BY
    mobile_id,
    region,
    (rn1 - rn2)
ORDER BY
    mobile_id,
    start_time;

enter image description here

Demo

Upvotes: 1

Related Questions