Reputation: 5

Excluding rows in a SQL query based on values in another row while preserving multiple outputs of a single part ID

I have machine output data in the form:

DATETIME            ID       VALUE
8-28-20 20:55:10    part1    13
8-28-20 20:56:60    part1    20
8-28-20 20:57:22    part1    25
8-28-20 20:59:39    part2    9
8-28-20 21:10:55    part3    33
8-28-20 21:14:30    part1    14

and I need to generate a new table by removing some rows:

DATETIME            ID       VALUE
8-28-20 20:57:22    part1    25
8-28-20 20:59:39    part2    9
8-28-20 21:10:55    part3    33
8-28-20 21:14:30    part1    14

The machine sometimes collects multiple VALUEs for each run, but I only need the last one (it's cumulative). However, I may have multiple runs of the same ID per shift, and it's not impossible to have 2 consecutive runs of the same ID.

Is it possible with SQL to filter out all rows where ID of a row is equal to the ID of the row above it, only if the VALUE is greater than the VALUE of the row above it?

There's some similar Qs posted here but they all result in grouping rows and taking a max value, but then I'd only capture one run per ID for each time period.

Upvotes: 0

Answers (3)

marcothesane

Reputation: 6749

A bit more generic, and also as an example to get a session id without a specific OLAP function for that:

WITH
-- your input
input(dttm,id,value) AS (
          SELECT TIMESTAMP '2020-08-28 20:55:10','part1',13
UNION ALL SELECT TIMESTAMP '2020-08-28 20:56:60','part1',20
UNION ALL SELECT TIMESTAMP '2020-08-28 20:57:22','part1',25
UNION ALL SELECT TIMESTAMP '2020-08-28 20:59:39','part2',9
UNION ALL SELECT TIMESTAMP '2020-08-28 21:10:55','part3',33
UNION ALL SELECT TIMESTAMP '2020-08-28 21:14:30','part1',14
)
,
-- add a counter that is at 1 whenever the id changes over time
with_chg AS (
  SELECT
    CASE 
      WHEN LAG(id) OVER(ORDER BY dttm) <> id THEN 1
      ELSE 0
    END AS chg_count
  , *
  FROM input
)
,
-- use the running sum of that change counter to get a session id
with_session AS (
  SELECT
    SUM(chg_count) OVER(ORDER BY dttm) AS session_id
  , dttm
  , id
  , value
  FROM with_chg
)
,
-- partition by the session id, order by datetime descending to get
-- the row number of 1 for the right row
with_rownum AS (
  SELECT
    ROW_NUMBER() OVER(PARTITION BY session_id ORDER BY dttm DESC) AS rownum
  , dttm
  , id
  , value
  FROM with_session
)
-- finally, filter by row number 1 and order back by datetime
SELECT
  dttm
, id
, value
FROM with_rownum
WHERE rownum = 1
ORDER BY 1
;
-- out         dttm         |  id   | value 
-- out ---------------------+-------+-------
-- out  2020-08-28 20:57:22 | part1 |    25
-- out  2020-08-28 20:59:39 | part2 |     9
-- out  2020-08-28 21:10:55 | part3 |    33
-- out  2020-08-28 21:14:30 | part1 |    14

Upvotes: 1

Gordon Linoff

Reputation: 1270793

You seem to want the rows where the id changes and the value increases:

select t.*
from (select t.*,
             lead(id) over (order by datetime) as next_id,
             lead(value) over (order by datetime) as next_value
      from t
     ) t
where next_id is null or next_id <> id or
      (next_id = id and next_value < value)

Upvotes: 0

Fahmi

Reputation: 37483

You can try the below - using row_number()

select * from
(
select *, 
       row_number() over(partition by dateadd(hour, datediff(hour, 0, DATETIME), 0), id order by DATETIME desc) as rn
from tablename
)A where rn=1

Upvotes: 0

Excluding rows in a SQL query based on values in another row while preserving multiple outputs of a single part ID

Answers (3)

Related Questions