thecoop
thecoop

Reputation: 46128

Returning results from a NOT IN query very quickly

We've got a system where we've got a collection of items (> 1 million), and several things processing it. Each processor should only process each item once, and the processors have a heirarchy.

Our current implementation is to have a 'processed' table keeping track of what each processor has already done:

CREATE TABLE items (id NUMBER PRIMARY KEY, ...)
CREATE TABLE itemsProcessed(
    item NUMBER REFERENCES items(id),
    processor NUMBER)

our query is this (itemsProcessed has got relevant indexes on it) - we're using a NOT IN to filter out items that have already been processed by the current processor or it's ancestors:

SELECT ... FROM items i WHERE <additional queries on items>
    AND id NOT IN (SELECT item FROM itemsProcessed WHERE processor IN (1, 2))

When the processed table gets very big, this query starts taking a long time (several seconds), as it has to do a lot of filtering before it starts returning the first item (the query plan is using a hash antijoin)

We need this query to return the first few items very quickly - ideally return the first items under 500ms. This means it can't iterate over items and filter out those in itemsProcessed. So we need some way of doing a negation index on the join of items and itemsProcessed (we've accomplished this on mongo, but oracle doens't seem to be able to do a similar thing)

Is this possible with Oracle?

Upvotes: 0

Views: 72

Answers (3)

David Brenchley
David Brenchley

Reputation: 86

Depending on how often the table is updated, you could create a materialized view of itemsNotProcessed. The processing would be done beforehand. You could also denormalize a bit and add a processed flag to the items table and add a bitmap index on the flag.

Upvotes: 0

schurik
schurik

Reputation: 7928

you can try to add a /*+ first_rows */ hint to you query

SELECT /*+ first_rows (10) */... FROM items i ...

or try to select first the unprocessed items and than do <additional queries on items>

with i_to_process AS
(
  SELECT item  FROM items
  minus 
  SELECT item FROM itemsProcessed WHERE processor IN (1, 2)
)
select * from i_to_process
where 
<additional queries on items>

Upvotes: 1

IMO this is a design problem. You're trying to exclude items which have already been processed when you should be trying to include items which have not been processed. The list of items which have been processed is going to grow continually; the list of items to be processed will remain small. Rather than having a table of items which have been processed (itemsProcessed) I suggest you create a table of items to be processed and then inner-join it to the query, deleting items from the ITEMS_TO_BE_PROCESSED table as they're processed.

Best of luck.

Upvotes: 0

Related Questions