FtDRbwLXw6
FtDRbwLXw6

Reputation: 28929

Horrible Oracle update performance

I am performing an update with a query like this:

UPDATE (SELECT     h.m_id,
                   m.id
        FROM       h
        INNER JOIN m
        ON         h.foo = m.foo)
SET    m_id = id
WHERE  m_id IS NULL

Some info:

The EXPLAIN PLAN for this query indicated a hash join and full table scans, but I'm no DBA, so I can't really interpret it very well.

The query itself ran for several hours and did not complete. I would have expected it to complete in no more than a few minutes. I've also attempted the following query rewrite:

UPDATE h
SET    m_id = (SELECT id
               FROM   m
               WHERE  m.foo = h.foo)
WHERE  m_id IS NULL

The EXPLAIN PLAN for this mentioned ROWID lookups and index usage, but it also went on for several hours without completing. I've also always been under the impression that queries like this would cause the subquery to be executed for every result from the outer query's predicate, so I would expect very poor performance from this rewrite anyway.

Is there anything wrong with my approach, or is my problem related to indexes, tablespace, or some other non-query-related factor?

Edit:

I'm also having abysmal performance from simple count queries like this:

SELECT COUNT(*)
FROM   h
WHERE  m_id IS NULL

These queries are taking anywhere from ~30 seconds to sometimes ~30 minutes(!).

I am noticing no locks, but the tablespace for these tables is sitting at 99.5% usage (only ~6MB free) right now. I've been told that this shouldn't matter as long as indexes are being used, but I don't know...

Upvotes: 2

Views: 8901

Answers (5)

Frank Schmitt
Frank Schmitt

Reputation: 30845

For large tables, MERGE is often much faster than UPDATE. Try this (untested):

MERGE INTO h USING
(SELECT     h.h_id,
            m.id as new_m_id
        FROM       h
        INNER JOIN m
        ON         h.foo = m.foo
 WHERE h.m_id IS NULL       
) new_data
ON (h.h_id = new_data.h_id)
WHEN MATCHED THEN
  UPDATE SET h.m_id = new_data.new_m_id;

Upvotes: 1

Nir Alfasi
Nir Alfasi

Reputation: 53565

I would update the table in iterations, for example, add a condition according to where h.date_created > sysdate-30 and after it finishes I would run the same query and change the condition to: where h.date_created between sysdate-30 and sysdate-60 etc. If you don't have a column like date_created maybe there's another column you can filter by ? for example: WHERE m.foo = h.foo AND m.foo between 1 and 10

Only the result of plan can explain why the cost of this update is high, but an educated guess will be that both tables are very big and that there are many NULL values as well as a lot of matching (m.foo = h.foo)...

Upvotes: 0

gpeche
gpeche

Reputation: 22524

Some points:

  • Oracle does not index NULL values (it will index a NULL that is part of a globally non-null tuple, but that's about it).

  • Oracle is going for a HASH JOIN because of the size of both h and m. This is likely the best option performance-wise.

  • The second UPDATE might get Oracle to use indexes, but then Oracle is usually smart about merging subqueries. And it would be a worse plan anyway.

  • Do you have recent, reasonable statistics for your schema? Oracle really needs decent statistics.

  • In your execution plan, which is the first table in the HASH JOIN? For best performance it should be the smaller table (m in your case). If you don't have good cardinality statistics, Oracle will get messed up. You can force Oracle to assume fixed cardinalities with the cardinality hint, it may help Oracle get a better plan.

For example, in your first query:

UPDATE (SELECT /*+ cardinality(h 5000000) cardinality(m 500000) */
               h.m_id, m.id 
        FROM h 
        INNER JOIN m 
        ON h.foo = m.foo) 
SET m_id = id 
WHERE m_id IS NULL
  • In Oracle, FULL SCAN reads not only every record in the table, it basically reads all storage allocated up to the maximum used (the high water mark in Oracle documentation). So if you have had a lot of deleted rows your tables might need some cleaning up. I have seen a SELECT COUNT(*) on an empty table consume 30+ seconds because the table in question had like 250 million deleted rows. If that is the case, I suggest analyzing your specific case with a DBA, so he/she can reclaim space from deleted rows and lower the high water mark.

Upvotes: 3

GuiGi
GuiGi

Reputation: 411

Try undocumented hint /*+ BYPASS_UJVC */. If it works, add an UNIQUE/PK constraint on m.foo.

Upvotes: 0

devio
devio

Reputation: 37225

As far as I remember, a WHERE m_id IS NULL performs a full-table scan, since NULL values cannot be indexed.

Full-table scan means, that the engine needs to read every record in the table to evaluate the WHERE condition, and cannot use an index.

You could try to add a virtual column set to a not-null value if m_id IS NULL, and index this column, and use this column in the WHERE condition.

Then you could also move the WHERE condition from the UPDATE statement to the sub-select, which will probably make the statement faster.

Since JOINs are expensive, rewriting INNER JOIN m ON h.foo = m.foo as

WHERE h.foo IN (SELECT m.foo FROM m WHERE m.foo IS NOT NULL)

may also help.

Upvotes: 2

Related Questions