Reputation: 15876
Products
product_id product_serial_number product_status
1 X123 PENDING
1 X123 PROCESSED
2 X345 PENDING
3 X678 PENDING
4 Y890 PENDING
4 Y890 PROCESSED
The above table shows the status of a product and its history. I need to produce a report with the output to be as shown below:
product_id status
1 UPDATE
2 NEW
3 NEW
4 UPDATE
I.e. if a product has previously been processed (e.g. products 1 and 4) its status is UPDATE otherwise its status is NEW.
I have come up with this query but i am not happy with its performance:
select product_id, 'UPDATE'
from products p1
where product_id in (select product_id from products p2 where p2.product_status='PROCESSED' and p2.product_status='ARCHIVED')
Union
select product_id, 'NEW'
from products p1
where product_id not in (select product_id from products p2 where p2.product_status='PROCESSED' and p2.product_status='ARCHIVED')
Another approach that could also work is to join the table to itself:
select p1.product_id, decode(p2.product_id, null, 'NEW','UPDATE')
from products p1, products p2
where p1.product_id=p2.product_id(+)
and p1.product_serial_number=p2.serial_number(+)
and p2.product_status(+) = 'PROCESSED'
When either of the queries are run against a large data set, the performance is not very good. How can i improve (or even change completely) the above queries for best performance?
Upvotes: 1
Views: 1050
Reputation: 2957
In addition to previous answers (I like the one about INTERSECT anв MINUS).
For very small quantity of the possible values of field 'product_status' an ordinary indexes (based on B-Tree) aren't working well. You need to use a bitmap index on this field. Oracle Bitmap Index Techniques CREATE INDEX in Oracle Docs - search for bitmap
Upvotes: 1
Reputation: 27427
Try this
with CTE as
(
select product_id, decode(product_status,'PROCESSED','UPDATE','NEW') status,
row_number() over (partition by product_id
order by decode(product_status,'PROCESSED','UPDATE','NEW') desc) rnum
from products p1
)
select * from cte where rnum = 1
Upvotes: 1
Reputation: 4748
Have you tried using GROUP BY
?
SELECT product_id, (CASE WHEN COUNT(*) = 1 THEN 'NEW' ELSE 'UPDATED' END) status
FROM products
WHERE product_status <> 'ARCHIVED'
GROUP BY product_id
Check out other GROUP BY
aggregate functions.
Edit
Fixed issue with Case expression syntax. Sorry about that.
Upvotes: 3
Reputation: 26333
You may get better speed using MINUS and INTERSECT, which are overlooked cousins of UNION.
All the products that have a PENDING and a PROCESSED row:
SELECT product_id FROM Products WHERE product_status = 'PENDING'
INTERSECT SELECT product_id FROM Products WHERE product_status = 'PROCESSED'
All the products that have a PENDING row but not a PROCESSED row:
SELECT product_id FROM Products WHERE product_status = 'PENDING'
MINUS SELECT product_id FROM Products WHERE product_status = 'PROCESSED'
Put them together (and add the NEW/UPDATE):
SELECT product_id, 'NEW' FROM (
SELECT product_id FROM Products WHERE product_status = 'PENDING'
MINUS SELECT product_id FROM Products WHERE product_status = 'PROCESSED')
UNION
SELECT product_id, 'UPDATE' FROM (
SELECT product_id FROM Products WHERE product_status = 'PENDING'
INTERSECT SELECT product_id FROM Products WHERE product_status = 'PROCESSED')
For a large table you're going to have at least 2/3 of the rows involved so the query will never be super fast.
If you plan to run this query a lot, you may also want to consider an index on product_status
.
Upvotes: 1