Daryl G
Daryl G

Reputation: 3

Partition data, detect value change in column, and ADD new COLUMN with DATE of change

Given the data below, I would like to:
id device date_observation device_state reading
2021100 A 2021-10-14 00:00:00.000 1 -0.3623
2021101 B 2021-10-14 00:00:00.000 1 -0.0004
2021102 C 2021-10-14 00:00:00.000 1 0.1333
2021103 A 2021-10-21 00:00:00.000 1 0.1907
2021104 B 2021-10-21 00:00:00.000 2 -0.3548
2021105 C 2021-10-21 00:00:00.000 1 -0.2275
2021106 A 2021-10-28 00:00:00.000 2 -0.2404
2021107 B 2021-10-28 00:00:00.000 2 -0.0244
2021108 C 2021-10-28 00:00:00.000 1 -0.3441
2021109 A 2021-11-04 00:00:00.000 2 0.2843
2021110 B 2021-11-04 00:00:00.000 3 0.2376
2021111 C 2021-11-04 00:00:00.000 1 -0.0247
2021112 A 2021-11-11 00:00:00.000 2 0.1814
2021113 B 2021-11-11 00:00:00.000 3 -0.2816
2021114 C 2021-11-11 00:00:00.000 1 -0.1493
2021115 A 2021-11-18 00:00:00.000 2 0.1694
2021116 B 2021-11-18 00:00:00.000 4 0.0864
2021117 C 2021-11-18 00:00:00.000 1 0.0395
2021118 A 2021-11-25 00:00:00.000 2 0.0658
2021119 B 2021-11-25 00:00:00.000 4 0.1291
2021120 C 2021-11-25 00:00:00.000 2 -0.1907

Using the code below, I am able partition data by device and to grab the latest record for each device.

WITH o AS (
SELECT *,
    ROW_NUMBER() OVER (PARTITION by device
                       ORDER BY date_observation DESC) AS queue
FROM observations  
)

SELECT *
FROM o
WHERE queue = 1

Results

id device date_observation device_state reading queue
2021118 A 2021-11-25 00:00:00.000 2 0.0658 1
2021119 B 2021-11-25 00:00:00.000 4 0.1291 1
2021120 C 2021-11-25 00:00:00.000 2 -0.1907 1

What is the best way to get the date of the latest state change?

For example the latest device state changed for device A occurred on 2021-10-28.

The expected output.

id device date_observation device_state reading queue date_state_change
2021118 A 2021-11-25 00:00:00.000 2 0.0658 1 2021-10-28
2021119 B 2021-11-25 00:00:00.000 4 0.1291 1 2021-11-18
2021120 C 2021-11-25 00:00:00.000 2 -0.1907 1 2021-11-25

Thoughts? Thanks.

Upvotes: 0

Views: 826

Answers (3)

Okechukwu Ossai
Okechukwu Ossai

Reputation: 604

You can also use LAG() to compare current row device_state and the previous row device state for each device and use MAX() to get the date_state_change.

WITH
f1 AS (
SELECT *, MAX(date_observation) OVER (PARTITION BY device) AS max_date,
       CASE WHEN device_state <> LAG(device_state) OVER (PARTITION BY device ORDER BY date_observation) 
       THEN date_observation END AS date_flag
FROM observations
),
f2 AS (
SELECT *, MAX(date_flag) OVER (PARTITION BY device) AS date_state_change
FROM f1
)
SELECT id, device, date_observation, device_state, reading, date_state_change  
FROM f2
WHERE max_date = date_observation;

See Demo

Upvotes: 0

Charlieface
Charlieface

Reputation: 71579

You can use more window functions

  • Use LEAD to identify the change-over rows
  • Use MAX to get the latest change-over date
WITH PrevValues AS (
    SELECT *,
      queue = ROW_NUMBER() OVER (PARTITION BY device ORDER BY date_observation DESC),
      date_state_change = CASE WHEN device_state <>
            LEAD(device_state, 1, -1) OVER (PARTITION BY device ORDER BY date_observation DESC)
            THEN date_observation END
    FROM observations o
),
MaxValues AS (
    SELECT *,
      max_date_state_change = MAX(date_state_change) OVER (PARTITION BY device)
    FROM PrevValues o
)
SELECT
  o.id,
  o.device,
  o.date_observation,
  o.device_state,
  o.reading,
  o.queue,
  o.max_date_state_change
FROM MaxValues o
WHERE o.queue = 1

db<>fiddle

Upvotes: 1

Christophe
Christophe

Reputation: 696

I propose you a solution built on your first part (I adpated it as I would have built the query) and a flow to calculate the latest state's change, whith the following steps:

  • add on each row the previous state
  • add an indicator of state's change
  • calculate max date for rows identified as state's change
  • join the result to yours

Here is the result I propose:

WITH prev_data AS (
  SELECT id, device, date_observation, device_state, reading,
         LAG(device_state) OVER (partition by device order by date_observation) as prev_state
    FROM observations
),
data_changes AS (
  SELECT id, device, date_observation, device_state, reading,
         CASE WHEN device_state = prev_state THEN 0 else 1 END as change_state
    FROM prev_data
),
last_change AS (
  SELECT device, max(date_observation) as date_state_change
    FROM data_changes
   WHERE change_state = 1
   GROUP BY device
),
max_date as (
    SELECT id, device, date_observation, device_state, reading,
           max(date_observation) over (partition by device) as max_date
      FROM observations
)
SELECT id, m.device, date_observation, device_state, reading, date_state_change
  FROM max_date m
  LEFT JOIN last_change l ON m.device = l.device
 WHERE date_observation = max_date

Result :

  id     device     date_observation         device_state      reading         date_state_change
2021118     A    2021-11-25 00:00:00.000          2            0.0658       2021-10-28 00:00:00.000
2021119     B    2021-11-25 00:00:00.000          4            0.1291       2021-11-18 00:00:00.000
2021120     C    2021-11-25 00:00:00.000          2            -0.1907      2021-11-25 00:00:00.000

You can check the result using dbFiddle

Upvotes: 0

Related Questions