marsl
marsl

Reputation: 1059

Event grouping in time series

I'm trying to build groups of precipitation events in my measurement data. I got a time, a measurement value and a flag noting if it's was raining:

00:00, 32.4, 0
00:10, 32.4, 0
00:20, 32.6, 1
00:30, 32.7, 1
00:40, 32.9, 1
00:50, 33.2, 1
01:00, 33.2, 0
01:10, 33.2, 0
01:20, 33.2, 0
01:30, 33.5, 1
01:40, 33.6, 1
01:50, 33.6, 0
02:00, 33.6, 0
...

Now I'd like to generate an event id for the precipitation events:

00:00, 32.4, 0, NULL
00:10, 32.4, 0, NULL
00:20, 32.6, 1, 1
00:30, 32.7, 1, 1
00:40, 32.9, 1, 1
00:50, 33.2, 1, 1
01:00, 33.2, 0, NULL
01:10, 33.2, 0, NULL
01:20, 33.2, 0, NULL
01:30, 33.5, 1, 2
01:40, 33.6, 1, 2
01:50, 33.6, 0, NULL
02:00, 33.6, 0, NULL
...

Then I'll be able to use grouping to summarize the events. Any hint how to do this in Oracle is much appreciated.

So far I was able to calculate the mentioned flag and the diff to the last row:

SELECT
  measured_at,
  station_id
  ps, -- precipitation sum
  ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at ASC) as p, -- precipitation delta
  CASE 
    WHEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at ASC) > 0 THEN 1
    ELSE 0 
    END as rainflag
FROM measurements;

I think it must be possible to generate the required event id somehow, but can't figure it out. Thanks for your time!

Final solution using mt0 answer:

DROP TABLE events;
CREATE TABLE events (measured_at, station_id, ps) AS
SELECT TO_DATE('2016-05-01 12:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.4 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.7 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:30', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.9 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:40', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:50', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.5 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:30', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:40', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:50', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.5 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.1 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.2 FROM DUAL;


WITH

flagged AS (
    SELECT
        measured_at,
        station_id,
        ps,
        CASE
          WHEN measured_at - lag(measured_at, 1, NULL) OVER (ORDER BY measured_at) = (1/144) THEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at)
          ELSE NULL
        END as delta_p,
        CASE
          WHEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at) > 0 THEN 1
          ELSE 0
        END AS rain
    FROM events
),

eventmarked AS (
    SELECT
        f.*,
        CASE
            WHEN f.delta_p >= 0 THEN f.delta_p
            ELSE NULL
        END AS p,
        CASE rain
            WHEN 1 THEN COUNT(1) OVER (ORDER BY measured_at) - SUM(rain) OVER (ORDER BY measured_at)
        END as event
    FROM flagged f
),

summarized AS (
    SELECT
        em.*,
        sum(CASE p WHEN 0 THEN NULL ELSE p END) OVER (PARTITION BY event ORDER BY measured_at) as e_ps
    FROM eventmarked em
)

SELECT measured_at, station_id, ps, p, e_ps FROM summarized
ORDER BY measured_at;

Upvotes: 1

Views: 97

Answers (2)

MT0
MT0

Reputation: 168371

Oracle Setup:

CREATE TABLE events ( measured_at, station_id, ps ) AS
SELECT '00:00', 32.4, 0 FROM DUAL UNION ALL
SELECT '00:10', 32.4, 0 FROM DUAL UNION ALL
SELECT '00:20', 32.6, 1 FROM DUAL UNION ALL
SELECT '00:30', 32.7, 1 FROM DUAL UNION ALL
SELECT '00:40', 32.9, 1 FROM DUAL UNION ALL
SELECT '00:50', 33.2, 1 FROM DUAL UNION ALL
SELECT '01:00', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:10', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:20', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:30', 33.5, 1 FROM DUAL UNION ALL
SELECT '01:40', 33.6, 1 FROM DUAL UNION ALL
SELECT '01:50', 33.6, 0 FROM DUAL UNION ALL
SELECT '02:00', 33.6, 0 FROM DUAL;

Query:

SELECT measured_at,
       station_id,
       ps,
       CASE WHEN rainflag IS NOT NULL THEN DENSE_RANK() OVER ( ORDER BY rainflag ) END AS rainflag
FROM   (
  SELECT e.*,
         CASE ps
           WHEN 1
           THEN COUNT( 1 ) OVER ( ORDER BY measured_at )
                  - SUM( ps ) OVER ( ORDER BY measured_at )
           END AS rainflag
  FROM   events e
)
ORDER BY measured_at;

Query 2

SELECT measured_at,
       station_id,
       ps,
       CASE ps WHEN 1
               THEN SUM( rainflag ) OVER ( ORDER BY measured_at )
               END AS rainflag
FROM   (
  SELECT e.*,
         CASE WHEN ps > LAG( ps, 1, 0 ) OVER ( ORDER BY measured_at )
              THEN 1
              END AS rainflag
  FROM   events e
);

Output:

MEASURED_AT STATION_ID         PS   RAINFLAG
----------- ---------- ---------- ----------
00:00             32.4          0            
00:10             32.4          0            
00:20             32.6          1          1 
00:30             32.7          1          1 
00:40             32.9          1          1 
00:50             33.2          1          1 
01:00             33.2          0            
01:10             33.2          0            
01:20             33.2          0            
01:30             33.5          1          2 
01:40             33.6          1          2 
01:50             33.6          0            
02:00             33.6          0            

Upvotes: 1

Marmite Bomber
Marmite Bomber

Reputation: 21095

Alternative solution using only LAG function.

In the subquery the column PS2 marks the rain started events. The main query simple sums this flag while ignoring the time that is not raining.

with ev as (
 select measured_at, station_id, ps,
 case when ps = 1 and lag(ps,1,0) over (order by measured_at) = 0 
    then 1 else 0 end ps2 
 from events)
select measured_at, station_id, ps, ps2,
 case when ps = 1 then 
   sum(ps2) over (order by measured_at) end rf
from ev
;

MEASURED_AT STATION_ID         PS        PS2         RF
----------- ---------- ---------- ---------- ----------
00:00             32,4          0          0            
00:10             32,4          0          0            
00:20             32,6          1          1          1 
00:30             32,7          1          0          1 
00:40             32,9          1          0          1 
00:50             33,2          1          0          1 
01:00             33,2          0          0            
01:10             33,2          0          0            
01:20             33,2          0          0            
01:30             33,5          1          1          2 
01:40             33,6          1          0          2 
01:50             33,6          0          0            
02:00             33,6          0          0        

Upvotes: 0

Related Questions