Reputation: 278

Refactoring SELECT with two correlated subqueries

Using Oracle 10g, I have a table that looks like this (syntax shortened for brevity):

CREATE TABLE "BUZINESS"."CATALOG" 
(   "ID" NUMBER PRIMARY KEY, 
    "ENTRY_ID" VARCHAR2(40 BYTE) NOT NULL,
    "MSG_ID" VARCHAR2(40 BYTE) NOT NULL, 
    "PUBLISH_STATUS" VARCHAR2(30 BYTE) NOT NULL, /* Can be NEW or PUBLISHED */
     CONSTRAINT "CATALOG_UN1" UNIQUE (ENTRY_ID, MSG_ID)
)

One process, Process A, writes Catalog entries with a PUBLISH_STATUS of 'NEW'. A second process, Process B, then comes in, grabs all 'NEW' messages, and then changes the PUBLISH_STATUS to 'PUBLISHED'.

I need to write a query that will grab all PUBLISH_STATUS='NEW' rows, BUT

I'm trying to prevent an out of order fetch, so that if Process B marks a row as PUBLISH_STATUS='PUBLISHED' with MSG_ID '1000', and then Process A writes an out of order row as PUBLISH_STATUS='NEW' with MSG_ID '999', the query will never fetch that row when grabbing all 'NEW' rows.

So, if I start with the data:

INSERT INTO BUZINESS.CATALOG VALUES (1, '1000', '999', 'NEW');
INSERT INTO BUZINESS.CATALOG VALUES (2, '1000', '1000', 'PUBLISHED');
INSERT INTO BUZINESS.CATALOG VALUES (3, '1000', '1001', 'NEW');

INSERT INTO BUZINESS.CATALOG VALUES (4, '2000', '1999', 'NEW');
INSERT INTO BUZINESS.CATALOG VALUES (5, '2000', '2000', 'PUBLISHED');
INSERT INTO BUZINESS.CATALOG VALUES (6, '2000', '2001', 'NEW');

INSERT INTO BUZINESS.CATALOG VALUES (7, '3000', '3001', 'NEW');

Then my query should grab only rows with ID: 3, 6, 7

I then have to join these rows with other data, so the result needs to be JOINable.

So far, I have a very large, ugly query UNIONing two correlated subqueries to do this. Could someone help me write a better query?

Upvotes: 2

Answers (4)

Martien

Reputation: 17

Although this is a very old post i still feel the need to reply here as i suspect this is based on misconception/misunderstanding. Oracle like many other RDBMSses still holds to the principles of ACID where the I stands for Isolation. No process x will see the result of another process y before y committed and x started after y. So one proces alterering the view of another proces on the data is not possible.

If not convinced run the query that updates and dont commit. Start another session and query the data again and again until the first query changes it. It will never change for the other sessions until you commit your changes in the first session and you will read the snapshot of the data in it's state it was when you started the query before the other process committed it for all to see.

Upvotes: 0

Bohemian

Reputation: 425073

Requiring non-presence of joinable data is best solved with an outer join that filters out matching joins (leaving just the non-matches).

In your case, the join condition is a "published" row for the same entry with a later (higher) message if.

This query produces your desired output:

select t1.*
from buziness_catalog t1
left join buziness_catalog t2
    on t2.entry_id = t1.entry_id
    and to_number(t2.msg_id) > to_number(t1.msg_id)
    and t2.publish_status = 'PUBLISHED'
where t1.publish_status = 'NEW'
and t2.id is null
order by t1.id

See live demo of this query working with your sample data to produce the your desired output. Note that is used a table name of "buziness_catalog" rather than "buziness.catalog" so the demo would run - you'll have to change the underscores back to dots.

Being a join, and not based on an exists correlated subquery, this will perform quite well.

This query would have been a little simpler had your msg_id column been a numeric type (the conversion from character to numeric would not have been needed). If your ID data is actually numeric, consider changing the datatype of entry_id and msg_id to a numeric type.

Upvotes: 2

watery

Reputation: 5487

@Laurence 's query looks good, but just to satisfy my curiosity, do you mind EXPLAINing this query too?

I think that those numbers stored as varchar will kill your index usage capabilities when in TO_NUMBER(), but I'm not sure about Oracle, so you better check that.
In case they do, you can always add additional number columns that you update with a trigger when rows are edited — so that you don't break the original design.

SELECT * 
FROM buziness b1
WHERE PUBLISH_STATUS = 'NEW'
    AND TO_NUMBER(msg_id) > COALESCE((
        SELECT MAX(TO_NUMBER(msg_id)) 
        FROM buziness b2
        WHERE PUBLISH_STATUS = 'PUBLISHED'
        AND b2.entry_id = b1.entry_id
    ), 0)

Upvotes: 1

Laurence

Reputation: 10976

Reading between the lines, I think this might work:

select
    *
from
    buziness.catalog b1
where 
    b1.publish_status = 'NEW' and
    not exists (
        select
            'x'
        from
            buziness.catalog b2
        where
            b1.entry_id = b2.entry_id and
            b2.publish_status = 'PUBLISHED' and
            to_number(b2.msg_id) > to_number(b1.msg_id) -- store numbers as numbers!
    );

Upvotes: 1

Refactoring SELECT with two correlated subqueries

Answers (4)

Related Questions