user18466310
user18466310

Reputation: 81

Removing the current row from a table if the column value of the previous row is the same Snowflake

I have a table as follows:

ID ACTIVE_STATUS DATE
45 TRUE 2022-06-12
45 TRUE 2022-06-13
45 FALSE 2022-07-01
36 TRUE 2022-08-01
36 FALSE 2022-08-02
36 FALSE 2022-08-14
36 TRUE 2022-08-15
14 TRUE 2022-03-25
14 TRUE 2022-03-28
14 TRUE 2022-03-29

I would like to remove rows from the table where within each ID group, if the current ACTIVE_STATUS value is the same as the value in the previous row, then remove the current row (Basically I am keeping the rows where the ACTIVE_STATUS shows a change for each ID group).

For example for ID 45, the active status was TRUE on 2022-06-12 and stayed TRUE until it became FALSE on 2022-07-01 so I would delete the row where the status is TRUE for date 2022-06-13 since there is no change in status between that and the previous row. I currently have the data ordered by DATE per ID group. I would like the output to look like

ID ACTIVE_STATUS DATE
45 TRUE 2022-06-12
45 FALSE 2022-07-01
36 TRUE 2022-08-01
36 FALSE 2022-08-02
36 TRUE 2022-08-15
14 TRUE 2022-03-25

I currently have:

SELECT ID, ACTIVE_STATUS, DATE 
FROM MY_TABLE 
GROUP BY ID, ACTIVE_STATUS, DATE   
ORDER BY DATE;

But I am not sure how to use lag() to achieve this or a partition? Any help would be great!

Upvotes: 0

Views: 506

Answers (2)

Rajat
Rajat

Reputation: 5803

Here's how you would use lag. You can keep the lag in the select to see what it's doing. Qualify is to Window Function what Having is to Group By.

select *
from your_table
qualify lag(active_status) over(partition by id order by date) <> active_status or
        lag(active_status) over(partition by id order by date) is null

If your team is familiar with null-safe equality operators, you could use is distinct from and simplify that to

select *
from your_table
qualify lag(active_status) over(partition by id order by date) is distinct from active_status

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175566

Using CONDITIONAL_CHANGE_EVENT windowed function:

WITH cte AS (
  SELECT *,     
    CONDITIONAL_TRUE_EVENT(ACTIVE_STATUS != 
                           LAG(ACTIVE_STATUS) OVER(PARTITION BY ID ORDER BY DATE)) 
    OVER(PARTITION BY ID ORDER BY DATE) AS con_true_event
  FROM tab
)
SELECT *
FROM cte
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID, con_true_event ORDER BY DATE) = 1
ORDER BY ID, DATE;

Sample data:

CREATE OR REPLACE TABLE tab(ID INT, 
                            ACTIVE_STATUS BOOLEAN,
                            DATE DATE)
AS
          SELECT 45,    TRUE    ,'2022-06-12'
UNION ALL SELECT 45,    TRUE    ,'2022-06-13'
UNION ALL SELECT 45,    FALSE   ,'2022-07-01'
UNION ALL SELECT 36,    TRUE    ,'2022-08-01'
UNION ALL SELECT 36,    FALSE   ,'2022-08-02'
UNION ALL SELECT 36,    FALSE   ,'2022-08-14'
UNION ALL SELECT 36,    TRUE    ,'2022-08-15'
UNION ALL SELECT 14,    TRUE    ,'2022-03-25'
UNION ALL SELECT 14,    TRUE    ,'2022-03-28'
UNION ALL SELECT 14,    TRUE    ,'2022-03-29';

Output:

enter image description here

Before filtering:

strong text

Upvotes: 1

Related Questions