lalaland
lalaland

Reputation: 341

How to determine if records (for a certain set) follow a correct order?

I have a table that contains 3 different status'

1. CLICKED
2. CLAIMED
3. BOUGHT

In this particular order. I am trying to determine if there were any records that did not occur in the right order based on their date.

For example, this is the data:

enter image description here

Record 121144 has correct order status, this is good.
Record 121200 is incorrect since bought happens before clicked even if clicked and claimed follow the right order.
Record 121122 is incorrect, since CLICKED status comes after CLAIMED.
Record 121111 also has correct order status (even if they are the same).
Record 121198 is also correct since the status order follows, even if there is no BOUGHT.
CREATE TABLE TBL_A 
(
    number_id int, 
    country varchar(50), 
    status varchar(50), 
    datetime date
);

INSERT INTO TBL_A 
VALUES (121144, 'USA', 'CLICKED', '2021-10-09'),
       (121144, 'USA', 'CLAIMED', '2021-10-10'),
       (121144, 'USA', 'BOUGHT', '2021-10-11'),
       (121111, 'CAD', 'CLICKED', '2021-10-12'),
       (121111, 'CAD', 'CLAIMED', '2021-10-12'),
       (121111, 'CAD', 'BOUGHT', '2021-10-12'),
       (121122, 'PES', 'CLICKED', '2021-09-11'),
       (121122, 'PES', 'CLAIMED', '2021-09-09'),
       (121122, 'PES', 'BOUGHT', '2021-09-12'),
       (121198, 'AU', 'CLICKED', '2021-09-11'),
       (121198, 'AU', 'CLAIMED', '2021-09-12'),
       (121200, 'POR', 'CLICKED', '2021-09-10'),
       (121200, 'POR', 'CLAIMED', '2021-09-11'),
       (121200, 'POR', 'BOUGHT', '2021-09-08');

Upvotes: 1

Views: 94

Answers (3)

Rajat
Rajat

Reputation: 5803

Here is one way using some string aggregation and manipulations. This works as expected against the sample data and also accounts for edge cases that include skipping status, missing status, and single status.

with cte as

(select *,listagg(status,'>') within group (order by datetime,charindex(status,'CLICKED>CLAIMED>BOUGHT')) over (partition by number_id, country) as event_order
from t)

select distinct 
       number_id,
       country, 
       case when charindex(event_order,'CLICKED>CLAIMED>BOUGHT,CLICKED>BOUGHT')>0 then 'Ordered' else 'Unordered' end as order_flag
from cte
order by number_id;

   

Upvotes: 0

shawnt00
shawnt00

Reputation: 17915

My answer includes the potential for skipped steps which OP mentioned in comments. Rather than using a strictly matching sequence this approach looks for adjacent pairs where the preceding step was numbered higher:

with A as (
    select *,        
        case status
            when 'CLICKED' then 1
            when 'CLAIMED' then 2
            when 'BOUGHT'  then 3 end as desired_order
    from T
), B as (
    select *,
        row_number() over (
            partition by number_id
            order by datetime, desired_order) as rn -- handles date ties
    from A
), C as (
    select *,
        -- look for pairs of rows where one is reversed
        case when lag(desired_order) over (partition by number_id order by rn) >
            desired_order then 'Y' end as flag
    from B
)
select number_id, min(country) as country,
    case min(flag) when 'Y' then 'Out of order' else 'In order' end as "status"
from C
group by number_id;

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=f0ee1de8e8e81229ddc23acc97bce7d7

As Thorston pointed out, you could also take the approach of generating a pair of row numbers and then comparing the two looking for mismatches. Glancing at the query plans this may involve an extra sort operation so it would be worth trying both ways on your data.

...
), B as (
    select *,
        row_number() over (
            partition by number_id
            order by desired_order) as rn1,
        row_number() over (
            partition by number_id
            order by datetime, desired_order) as rn2
    from A
)
select
  number_id, min(country) as country,
  case when max(case when rn1 <> rn2 then 1 else 0 end) = 1
    then 'Out of order' else 'In order' end as status
...

Upvotes: 2

Lukasz Szozda
Lukasz Szozda

Reputation: 175606

Using ARRAY_AGG ordered by datetime:

SELECT number_id, 
    ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) AS statuses, -- debug
    CASE WHEN ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) 
         IN (ARRAY_CONSTRUCT('CLICKED', 'CLAIMED', 'BOUGHT'),
             ARRAY_CONSTRUCT('CLICKED', 'CLAIMED')) THEN 'In order'
         ELSE 'Out of order'
    END AS status
FROM TBL_A
GROUP BY number_id;

Output:

enter image description here

Upvotes: 1

Related Questions