Olorun
Olorun

Reputation: 481

SQL query group where one column is missing specific data

I have an Oracle SQL database that consists of order book information, e.g.

ORDER_ID TIMESTAMP OPERATION ORDER_STATUS ...
1 00:00:01 INSERT New ...
1 00:00:05 UPDATE Partially Filled ...
2 00:00:07 UPDATE Partially Filled ...
1 00:00:08 CANCEL Filled ...
3 00:00:08 INSERT NEW ...

Now, this data is not always perfectly filled, as in, sometimes there will be order_ids with missing information.

One such problem I'm trying to identify are orders (so, all orders with the same order_ID) that are missing the Operation 'INSERT', e.g. there are order_ids that have the operation 'UPDATE' or 'CANCEL', but not an 'INSERT'. In the example table above, that would be the case for order_id 2 (it has an 'UPDATE', but no 'INSERT')

I was able to perform this analysis by just download the whole data set and analyzing the data with Python, but I'd like to be able to do this (faster) with a direct SQL query, if possible.

Upvotes: 0

Views: 55

Answers (3)

astentx
astentx

Reputation: 6751

You may use conditional aggregation to check if some status exists along with others.

select order_id
from sample
where operation in ('INSERT', 'UPDATE', 'CANCEL', 'OTHER_STATUS_TO_CHECK_TOGETHER_WITH_INSERT')
group by order_id
having count(case operation when 'INSERT' then 1 end) = 0

For your sample data it returns

ORDER_ID
2

fiddle

Upvotes: 1

MT0
MT0

Reputation: 168371

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching to determine whether there is an INSERT operation before an UPDATE or CANCEL operation:

SELECT order_id
FROM   order_book
MATCH_RECOGNIZE(
  PARTITION BY order_id
  ORDER     BY timestamp
  MEASURES
    COUNT(insert_op.operation) AS has_insert
  PATTERN ( insert_op? (update_op|cancel_op)+ )
  DEFINE
    insert_op AS operation = 'INSERT',
    update_op AS operation = 'UPDATE',
    cancel_op AS operation = 'CANCEL'
)
WHERE has_insert = 0;

Which, for the sample data:

CREATE TABLE order_book (ORDER_ID, TIMESTAMP, OPERATION, ORDER_STATUS ) AS
SELECT 1, '00:00:01', 'INSERT', 'New'              FROM DUAL UNION ALL
SELECT 1, '00:00:05', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 2, '00:00:07', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 1, '00:00:08', 'CANCEL', 'Filled'           FROM DUAL UNION ALL
SELECT 3, '00:00:08', 'INSERT', 'NEW'              FROM DUAL;

Outputs:

ORDER_ID
2

Or, if you want groups where an UPDATE or CANCEL operation is first in the group then you can use:

SELECT order_id
FROM   order_book
MATCH_RECOGNIZE(
  PARTITION BY order_id
  ORDER     BY timestamp
  PATTERN ( ^ (update_op|cancel_op) )
  DEFINE
    update_op AS operation = 'UPDATE',
    cancel_op AS operation = 'CANCEL'
);

Which outputs the same.

fiddle

Upvotes: 1

Stu
Stu

Reputation: 32614

See if the following helps:

Select Order_Id
from t
where Operation in ('UPDATE','CANCEL')
and not exists (
  select * from t t2
  where t2.Order_Id = t.Order_Id and t2.Operation = 'INSERT'
);

Upvotes: 1

Related Questions