Saqib Ali
Saqib Ali

Reputation: 4400

MATCH_RECOGNIZE to find events that do not end with certain pattern

I have the transaction audit history as following. All audits start with a INIT and may have a REVERSAL and a MODIFICATION. I need to find all the audits with INIT -> REVERSAL but no MODIFICATION

TRANSACTION_ID ACTION AMOUNT AUDIT_TS
0 INIT 14 2022-07-10 14:49:17
111 INIT 10 2022-07-10 14:03:09
111 REVERSAL 10 2022-07-10 14:24:10
111 MODIFICATION 8 2022-07-10 14:49:11
222 INIT 12 2022-07-10 14:07:12
222 REVERSAL 12 2022-07-10 14:24:12
222 MODIFICATION 9 2022-07-10 14:43:13
333 REVERSAL 15 2022-07-10 14:26:14
333 MODIFICATION 14 2022-07-10 14:43:15
444 INIT 14 2022-07-10 14:42:16
555 INIT 5 2022-07-10 14:08:18
555 REVERSAL 5 2022-07-10 14:26:19

I tried using the Regex End Anchor ($) as following, but it also returns the audits that end with MODIFICATION:

select 
 *
from audit_table
match_recognize(
    partition by transaction_id
    order by audit_ts
    ALL ROWS PER MATCH
    PATTERN (INIT REVERSAL$)
    DEFINE
          INIT as iff(ACTION='INIT',TRUE,FALSE),
          REVERSAL as iff(ACTION='REVERSAL',TRUE,FALSE),
          MODIFICATION as iff(ACTION='MODIFICATION',TRUE,FALSE)
);

Please advise.

Upvotes: 2

Views: 73

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

Another why to think about this, is to just write the logic you want:

with data(TRANSACTION_ID,   ACTION, AMOUNT, AUDIT_TS) as (
    select 
        column1, column2, column3, try_to_timestamp(column4) 
    from values
    (0      ,'INIT',    14, '2022-07-10 14:49:17'),
    (111    ,'INIT',    10, '2022-07-10 14:03:09'),
    (111    ,'REVERSAL',    10, '2022-07-10 14:24:10'),
    (111    ,'MODIFICATION',    8,  '2022-07-10 14:49:11'),
    (222    ,'INIT',    12, '2022-07-10 14:07:12'),
    (222    ,'REVERSAL',    12, '2022-07-10 14:24:12'),
    (222    ,'MODIFICATION',    9,  '2022-07-10 14:43:13'),
    (333    ,'REVERSAL',    15, '2022-07-10 14:26:14'),
    (333    ,'MODIFICATION',    14, '2022-07-10 14:43:15'),
    (444    ,'INIT',    14, '2022-07-10 14:42:16'),
    (555    ,'INIT',    5,  '2022-07-10 14:08:18'),
    (555    ,'REVERSAL',    5,  '2022-07-10 14:26:19')
)
select
    *
    ,first_value(ACTION) over(partition by TRANSACTION_ID order by AUDIT_TS) = 'INIT' as l1
    ,last_value(ACTION) over(partition by TRANSACTION_ID order by AUDIT_TS) = 'REVERSAL' as l2
    ,count(ACTION) over(partition by TRANSACTION_ID) = 2 as l3
    ,l1 and l2 and l3 logic_you_want
from data
order by 1,4;
TRANSACTION_ID ACTION AMOUNT AUDIT_TS L1 L2 L3 LOGIC_YOU_WANT
0 INIT 14 2022-07-10 14:49:17.000 TRUE FALSE FALSE FALSE
111 INIT 10 2022-07-10 14:03:09.000 TRUE FALSE FALSE FALSE
111 REVERSAL 10 2022-07-10 14:24:10.000 TRUE FALSE FALSE FALSE
111 MODIFICATION 8 2022-07-10 14:49:11.000 TRUE FALSE FALSE FALSE
222 INIT 12 2022-07-10 14:07:12.000 TRUE FALSE FALSE FALSE
222 REVERSAL 12 2022-07-10 14:24:12.000 TRUE FALSE FALSE FALSE
222 MODIFICATION 9 2022-07-10 14:43:13.000 TRUE FALSE FALSE FALSE
333 REVERSAL 15 2022-07-10 14:26:14.000 FALSE FALSE TRUE FALSE
333 MODIFICATION 14 2022-07-10 14:43:15.000 FALSE FALSE TRUE FALSE
444 INIT 14 2022-07-10 14:42:16.000 TRUE FALSE FALSE FALSE
555 INIT 5 2022-07-10 14:08:18.000 TRUE TRUE TRUE TRUE
555 REVERSAL 5 2022-07-10 14:26:19.000 TRUE TRUE TRUE TRUE

thus:

select
    *
from data
qualify first_value(ACTION) over(partition by TRANSACTION_ID order by AUDIT_TS) = 'INIT' 
    and last_value(ACTION) over(partition by TRANSACTION_ID order by AUDIT_TS) = 'REVERSAL'
    and count(ACTION) over(partition by TRANSACTION_ID) = 2
order by 1,4;

gives:

TRANSACTION_ID ACTION AMOUNT AUDIT_TS
555 INIT 5 2022-07-10 14:08:18.000
555 REVERSAL 5 2022-07-10 14:26:19.000

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175566

The pattern needs to be altered using quantifiers ^INIT{1} REVERSAL{1}$:

select  *
from audit_table
match_recognize(
    partition by transaction_id
    order by audit_ts
    ALL ROWS PER MATCH
    PATTERN (^INIT{1} REVERSAL{1}$)
    DEFINE
         INIT as ACTION='INIT',
         REVERSAL as ACTION='REVERSAL',
         MODIFICATION as ACTION='MODIFICATION'
)
ORDER BY transaction_id, audit_ts;

Snowflake supports boolean data type natively so:

INIT as iff(ACTION='INIT',TRUE,FALSE),
<=>
INIT as ACTION='INIT',

Output:

enter image description here

Upvotes: 2

Related Questions