Reputation: 28929
I am performing an update with a query like this:
UPDATE (SELECT h.m_id,
m.id
FROM h
INNER JOIN m
ON h.foo = m.foo)
SET m_id = id
WHERE m_id IS NULL
Some info:
h
is roughly ~5 million rowsh
have NULL
values for m_id
m
is roughly ~500 thousand rowsm_id
on table h
is an indexed foreign key pointing to id
on table m
id
on table m
is the primary keym.foo
and h.foo
The EXPLAIN PLAN
for this query indicated a hash join and full table scans, but I'm no DBA, so I can't really interpret it very well.
The query itself ran for several hours and did not complete. I would have expected it to complete in no more than a few minutes. I've also attempted the following query rewrite:
UPDATE h
SET m_id = (SELECT id
FROM m
WHERE m.foo = h.foo)
WHERE m_id IS NULL
The EXPLAIN PLAN
for this mentioned ROWID lookups and index usage, but it also went on for several hours without completing. I've also always been under the impression that queries like this would cause the subquery to be executed for every result from the outer query's predicate, so I would expect very poor performance from this rewrite anyway.
Is there anything wrong with my approach, or is my problem related to indexes, tablespace, or some other non-query-related factor?
Edit:
I'm also having abysmal performance from simple count queries like this:
SELECT COUNT(*)
FROM h
WHERE m_id IS NULL
These queries are taking anywhere from ~30 seconds to sometimes ~30 minutes(!).
I am noticing no locks, but the tablespace for these tables is sitting at 99.5% usage (only ~6MB free) right now. I've been told that this shouldn't matter as long as indexes are being used, but I don't know...
Upvotes: 2
Views: 8901
Reputation: 30845
For large tables, MERGE is often much faster than UPDATE. Try this (untested):
MERGE INTO h USING
(SELECT h.h_id,
m.id as new_m_id
FROM h
INNER JOIN m
ON h.foo = m.foo
WHERE h.m_id IS NULL
) new_data
ON (h.h_id = new_data.h_id)
WHEN MATCHED THEN
UPDATE SET h.m_id = new_data.new_m_id;
Upvotes: 1
Reputation: 53565
I would update the table in iterations, for example, add a condition according to where h.date_created > sysdate-30
and after it finishes I would run the same query and change the condition to: where h.date_created between sysdate-30 and sysdate-60
etc. If you don't have a column like date_created
maybe there's another column you can filter by ? for example: WHERE m.foo = h.foo AND m.foo between 1 and 10
Only the result of plan
can explain why the cost of this update is high, but an educated guess will be that both tables are very big and that there are many NULL
values as well as a lot of matching (m.foo = h.foo
)...
Upvotes: 0
Reputation: 22524
Some points:
Oracle does not index NULL
values (it will index a NULL
that is part of a globally non-null tuple, but that's about it).
Oracle is going for a HASH JOIN
because of the size of both h
and m
. This is likely the best option performance-wise.
The second UPDATE
might get Oracle to use indexes, but then Oracle is usually smart about merging subqueries. And it would be a worse plan anyway.
Do you have recent, reasonable statistics for your schema? Oracle really needs decent statistics.
In your execution plan, which is the first table in the HASH JOIN
? For best performance it should be the smaller table (m
in your case). If you don't have good cardinality statistics, Oracle will get messed up. You can force Oracle to assume fixed cardinalities with the cardinality
hint, it may help Oracle get a better plan.
For example, in your first query:
UPDATE (SELECT /*+ cardinality(h 5000000) cardinality(m 500000) */
h.m_id, m.id
FROM h
INNER JOIN m
ON h.foo = m.foo)
SET m_id = id
WHERE m_id IS NULL
SELECT COUNT(*)
on an empty table consume 30+ seconds because the table in question had like 250 million deleted rows. If that is the case, I suggest analyzing your specific case with a DBA, so he/she can reclaim space from deleted rows and lower the high water mark.Upvotes: 3
Reputation: 411
Try undocumented hint /*+ BYPASS_UJVC */. If it works, add an UNIQUE/PK constraint on m.foo.
Upvotes: 0
Reputation: 37225
As far as I remember, a WHERE m_id IS NULL
performs a full-table scan, since NULL values cannot be indexed.
Full-table scan means, that the engine needs to read every record in the table to evaluate the WHERE condition, and cannot use an index.
You could try to add a virtual column set to a not-null value if m_id IS NULL
, and index this column, and use this column in the WHERE condition.
Then you could also move the WHERE condition from the UPDATE statement to the sub-select, which will probably make the statement faster.
Since JOINs are expensive, rewriting INNER JOIN m ON h.foo = m.foo
as
WHERE h.foo IN (SELECT m.foo FROM m WHERE m.foo IS NOT NULL)
may also help.
Upvotes: 2