Reputation: 24336
The following SQL generates all matching records between two tables that have identical schemas and then proceeds to iterate over the cursor that stores this result set. I do a row by row insert with a commit at the end of this function. My question is how can I get the maximum performance from this type of query? Code follows:
BEGIN
DECLARE
CURSOR foo IS
SELECT * FROM tableOne to
WHERE EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR); --THIS TAKES 5 MINUTES (66 MILLION ROWS)
BEGIN
FOR nextFoo IN foo
LOOP
INSERT INTO tracker t
(id,foo,bar,baz)
VALUES(trackerSequence.nextval, nextFoo.foo,nextFoo.bar,nextFoo.baz);
END LOOP;
COMMIT;
END;
END;
This query can take upwards of an hour and I am trying to reduce the time cost associated with it. I will be processing 140 million records in general so I am expecting to double the amount of time this process takes. All columns are indexed.
Version information:
10g 10.2
Upvotes: 0
Views: 16407
Reputation: 24336
I have found the following will do 130 million inserts in about 49 minutes.
INSERT INTO tracker t
SELECT * FROM tableOne to
WHERE NOT EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR);
Upvotes: 0
Reputation: 7932
I have almost always received better performance with such bulk data inserts by using a combination of BITMAP INDEXES
and using a DPL (Direct Path Load) i.e. with the use of hint /*+ APPEND+/
.
I would also assume that with this you would have proper indexes on both TT.FOO, TT.BAR
and TO.FOO, TO.BAR
.
So somethink like
INSERT /*+ APPEND*/
INTO TRACKER T
SELECT trackerSequence.nextval, to.foo,to.bar,to.baz
FROM tableOne to
WHERE EXISTS (SELECT 'x'
FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR);
Also- Keep in mind that the EXIST
clause might bite you back under certain situations. So, you may want to use simple outer joins.
INSERT /*+ APPEND*/
INTO TRACKER T
SELECT DISTINCT trackerSequence.nextval, to.foo,to.bar,to.baz
FROM tableOne to , tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR;
Remember - DPL (Direct path load) will not always improve the performance of your query, it may improve (or help) if your table is properly partitioned.
Try Explain plan on these queries to find out the best.
Also, (as one of the answer already mentions) do not commit at the end, but do not commit on every record either. It would be suggested to use a custom commit point something similar to while using LIMIT XXXX
while BULK COLLECT
ing. Your commit points will govern how large your ROLLBAK
segments are. You can also use your custom Commit points (as simple as as counter) procedurally (i.e. in a PLSQL BLOCK).
Query performance also depends on the HWM
of your table (to be specific), you would almost always want to perform under the HWM
of the table. Whilst TRUNCATE
on the TRACKER
table will help achieve this, previous data on it will be lost, so this could hardly be a solution here. Follow this AskTom link to learn how to find HWM
.
Upvotes: 0
Reputation: 6336
DECLARE
CURSOR foo_cur
IS SELECT * FROM tableOne TO
WHERE EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR); --THIS TAKES 5 MINUTES (66 MILLION ROWS)
TYPE foo_nt IS TABLE OF tableOne%ROWTYPE;
v_foo_nt foo_nt;
BEGIN
OPEN foo_cur ;
LOOP
FETCH foo_cur BULK COLLECT INTO v_foo_nt LIMIT 1000;
FORALL i IN v_foo_nt.FIRST..v_foo_nt.LAST
INSERT INTO tracker t
(id,foo,bar,baz)
VALUES(trackerSequence.nextval, v_foo_nt(i).foo,v_foo_nt(i).bar,v_foo_nt(i).baz);
EXIT WHEN foo_cur%NOTFOUND;
END LOOP;
CLOSE foo_cur;
COMMIT;
END;
END;
Upvotes: 1
Reputation: 3311
OK, I know you wanted the cursor...
The only real advantage to using the cursor is to commit every 10k? rows when processing that much data to avoid filling the logs up.
Unless you really need the cursor, Eliminate the row processing.
insert into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where
t1.foo = t2.foo and
t1.bar = t2.bar;
Direct Path Insert hint as suggested
insert /*+ append */ into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where
t1.foo = t2.foo and
t1.bar = t2.bar;
Upvotes: 3
Reputation: 3497
how about
INSERT INTO tracker t SELECT trackerSequence.nextVal
,foo
,bar
,baz
FROM tableOne to
INNER JOIN tabletwo tt
ON (to.foo = tt.foo and to.bar=tt.bar);
I wonder if that would be optimized better.
Also make sure the tracker -table indexes are disabled while inserting..
Upvotes: 6
Reputation: 5866
First - how can you optimize your PL/SQL's performance:
Second - don't do the insert with PL/SQL. Use BulkLoading (as already suggested by some comments). you can easily find lots of info on BulkLoading if you Google for "oracle sql loader"
Upvotes: 0