Reputation: 46128
We've got a system where we've got a collection of items (> 1 million), and several things processing it. Each processor should only process each item once, and the processors have a heirarchy.
Our current implementation is to have a 'processed' table keeping track of what each processor has already done:
CREATE TABLE items (id NUMBER PRIMARY KEY, ...)
CREATE TABLE itemsProcessed(
item NUMBER REFERENCES items(id),
processor NUMBER)
our query is this (itemsProcessed
has got relevant indexes on it) - we're using a NOT IN to filter out items that have already been processed by the current processor or it's ancestors:
SELECT ... FROM items i WHERE <additional queries on items>
AND id NOT IN (SELECT item FROM itemsProcessed WHERE processor IN (1, 2))
When the processed table gets very big, this query starts taking a long time (several seconds), as it has to do a lot of filtering before it starts returning the first item (the query plan is using a hash antijoin)
We need this query to return the first few items very quickly - ideally return the first items under 500ms. This means it can't iterate over items
and filter out those in itemsProcessed
. So we need some way of doing a negation index on the join of items
and itemsProcessed
(we've accomplished this on mongo, but oracle doens't seem to be able to do a similar thing)
Is this possible with Oracle?
Upvotes: 0
Views: 72
Reputation: 86
Depending on how often the table is updated, you could create a materialized view of itemsNotProcessed. The processing would be done beforehand. You could also denormalize a bit and add a processed flag to the items table and add a bitmap index on the flag.
Upvotes: 0
Reputation: 7928
you can try to add a /*+ first_rows */ hint to you query
SELECT /*+ first_rows (10) */... FROM items i ...
or try to select first the unprocessed items and than do <additional queries on items>
with i_to_process AS
(
SELECT item FROM items
minus
SELECT item FROM itemsProcessed WHERE processor IN (1, 2)
)
select * from i_to_process
where
<additional queries on items>
Upvotes: 1
Reputation: 50047
IMO this is a design problem. You're trying to exclude items which have already been processed when you should be trying to include items which have not been processed. The list of items which have been processed is going to grow continually; the list of items to be processed will remain small. Rather than having a table of items which have been processed (itemsProcessed) I suggest you create a table of items to be processed and then inner-join it to the query, deleting items from the ITEMS_TO_BE_PROCESSED table as they're processed.
Best of luck.
Upvotes: 0