Reputation: 153
We're experiencing very slow performance updating a table in Oracle 9i when a Select statement to return the same rows takes less than 20 seconds. When I say very slow, we have left it running overnight and it still did not finish.
I've simplified the tables and columns to illustrate the problem and to hide the names as they are tables used in a third-party app (however, the data is normally maintained in-house and support calls are expensive). Even in this simple example, the update didn't finish after running for over 4 hours.
CREATE TABLE lookup
(
lookup_key NUMBER(10,0) PRIMARY KEY,
lookup_name VARCHAR2(30)
);
There are c. 34,000 rows in this table
CREATE TABLE products
(
product_key NUMBER(10,0) PRIMARY KEY,
product_val NUMBER(10,0)
);
There are c. 14.5 million rows in this table
The Select statement is:
SELECT * FROM lookup
WHERE lookup_name <> 'Redundant'
AND NOT EXISTS (SELECT 1 FROM products
where product_val = lookup_key);
The Update statement is:
UPDATE lookup
SET lookup_name = 'Redundant'
WHERE lookup_name <> 'Redundant'
AND NOT EXISTS (SELECT 1 FROM products
WHERE product_val = lookup_key);
The execution plan is the same between the 2 both in this trivial example and the real world one. All statistics are up to date.
I guess we could do it in PL/SQL but surely it should be quick enough to update all the rows in lookup, that aren't in the product table, to 'Redundant' in SQL? Any ideas or advice gratefully received thanks.
Upvotes: 0
Views: 3810
Reputation: 21075
Here is your sample setup
CREATE TABLE lookup
as select
rownum lookup_key, 'xxxxx'||rownum as lookup_name
from dual connect by level <= 34000;
CREATE TABLE products
as
with a as
(select rownum i
from dual connect by level <= 1450000),
b as
(select /*+ MATERIALIZE */ rownum j
from dual connect by level <= 10)
select
i*10 + j + 34000 product_key, i*10 + j + 34000 as product_val
from a cross join b;
The update gives this execution plan - note the HASH JOIN ANTI
this is a effectiv plan for such a query. If you see something else (e..g FILTER
) this is a problem, you perform the FULL TABLE SCAN
once for each updatetd row.
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32957 | 1609K| 554 (3)| 00:00:10 |
|* 1 | HASH JOIN ANTI | | 32957 | 1609K| 554 (3)| 00:00:10 |
|* 2 | TABLE ACCESS FULL| LOOKUP | 32957 | 1190K| 19 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PRODUCTS | 1320K| 16M| 530 (2)| 00:00:10 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PRODUCT_VAL"="LOOKUP_KEY")
2 - filter("LOOKUP_NAME"<>'Redundant')
And here the update taking less than 5 seconds to update all rows of the lookup table
SQL> UPDATE lookup
2 SET lookup_name = 'Redundant'
3 WHERE lookup_name <> 'Redundant'
4 AND NOT EXISTS (SELECT 1 FROM products
5 WHERE product_val = lookup_key);
34000 rows updated.
Elapsed: 00:00:04.97
Note that this is a defensive strategy if you expect lots of updates. The total cost of the update is comparable to a one FULL TABLE SCAN
on the big table.
In case you expects a small number of updates you may profit from the index proposed in other answer.
create index products_idx on products(PRODUCT_VAL);
The execution plan changes to NESTED LOOPS ANTI
:
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 32 | 13 (0)| 00:00:01 |
| 1 | UPDATE | LOOKUP | | | | |
| 2 | NESTED LOOPS ANTI | | 1 | 32 | 13 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| LOOKUP | 1 | 19 | 11 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | PRODUCTS_IDX | 14M| 178M| 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("LOOKUP_NAME"<>'Redundant')
4 - access("PRODUCT_VAL"="LOOKUP_KEY")
The update of one row goes instantly ...
SQL> UPDATE lookup
2 SET lookup_name = 'Redundant'
3 WHERE lookup_name <> 'Redundant'
4 AND NOT EXISTS (SELECT 1 FROM products
5 WHERE product_val = lookup_key);
1 row updated.
Elapsed: 00:00:00.13
... but the elapsed time will lineary increase with the numer of updated rows (and for few thousends the cost will increase the cost of the FTS in the previous execution plan.
Upvotes: 2
Reputation: 4424
Add an index to PRODUCT_VAL
to avoid scanning the PRODUCTS
table for each LOOKUP
Upvotes: 0
Reputation: 4818
For test purpose I would do:
create table lookup2 as select * from lookup
And run your update.
UPDATE lookup2
SET lookup_name = 'Redundant'
WHERE lookup_name <> 'Redundant'
AND NOT EXISTS (SELECT 1 FROM products
WHERE product_val = lookup_key);
If that goes fast it means slow down is due to index rebuild on lookup_name, triggers, chained rows or other problems tablespace. If that goes slow it means problem is query itself or there are problems with physical disk performance (what is hard to believe with 34k rows).
I also wonder how this corellated not exists works here. maybe try:
create table tmp as
SELECT lookup_key FROM lookup
WHERE lookup_name <> 'Redundant'
AND NOT EXISTS (SELECT 1 FROM products
where product_val = lookup_key);
merge into lookup l
using (select * from tmp) t
on (l.lookup_key = t.lookup_key)
when matched then update set lookup_name = 'Redundant'
WHERE lookup_name <> 'Redundant';
drop table tmp;
Or the select was fast only for first 100 rows not all of them?
Upvotes: 1