Woot4Moo
Woot4Moo

Reputation: 24336

Oracle SQL insert large data set efficiently via cursor use and single commit

The following SQL generates all matching records between two tables that have identical schemas and then proceeds to iterate over the cursor that stores this result set. I do a row by row insert with a commit at the end of this function. My question is how can I get the maximum performance from this type of query? Code follows:

BEGIN  
DECLARE    
   CURSOR foo IS  
        SELECT * FROM tableOne to  
        WHERE EXISTS (SELECT * FROM tableTwo tt  
                       WHERE TO.FOO = TT.FOO  
                       AND TO.BAR = TT.BAR);  --THIS TAKES 5 MINUTES (66 MILLION ROWS)
     BEGIN  
           FOR nextFoo IN foo  
     LOOP  
            INSERT INTO tracker t  
               (id,foo,bar,baz)  
            VALUES(trackerSequence.nextval, nextFoo.foo,nextFoo.bar,nextFoo.baz);  
     END LOOP;   
     COMMIT;  
     END;  
END;

This query can take upwards of an hour and I am trying to reduce the time cost associated with it. I will be processing 140 million records in general so I am expecting to double the amount of time this process takes. All columns are indexed.

Version information:

10g 10.2

Upvotes: 0

Views: 16407

Answers (6)

Woot4Moo
Woot4Moo

Reputation: 24336

I have found the following will do 130 million inserts in about 49 minutes.

INSERT INTO tracker t  
    SELECT * FROM tableOne to  
        WHERE NOT EXISTS (SELECT * FROM tableTwo tt  
                       WHERE TO.FOO = TT.FOO  
                       AND TO.BAR = TT.BAR);

Upvotes: 0

Anjan Biswas
Anjan Biswas

Reputation: 7932

I have almost always received better performance with such bulk data inserts by using a combination of BITMAP INDEXES and using a DPL (Direct Path Load) i.e. with the use of hint /*+ APPEND+/.

I would also assume that with this you would have proper indexes on both TT.FOO, TT.BAR and TO.FOO, TO.BAR. So somethink like

INSERT /*+ APPEND*/
  INTO TRACKER T
SELECT trackerSequence.nextval, to.foo,to.bar,to.baz
  FROM tableOne to  
  WHERE EXISTS (SELECT 'x' 
                  FROM tableTwo tt  
                 WHERE TO.FOO = TT.FOO  
                   AND TO.BAR = TT.BAR);

Also- Keep in mind that the EXIST clause might bite you back under certain situations. So, you may want to use simple outer joins.

INSERT /*+ APPEND*/
  INTO TRACKER T
SELECT DISTINCT trackerSequence.nextval, to.foo,to.bar,to.baz
  FROM tableOne to , tableTwo tt
  WHERE TO.FOO = TT.FOO  
    AND TO.BAR = TT.BAR;

Remember - DPL (Direct path load) will not always improve the performance of your query, it may improve (or help) if your table is properly partitioned.

Try Explain plan on these queries to find out the best. Also, (as one of the answer already mentions) do not commit at the end, but do not commit on every record either. It would be suggested to use a custom commit point something similar to while using LIMIT XXXX while BULK COLLECTing. Your commit points will govern how large your ROLLBAK segments are. You can also use your custom Commit points (as simple as as counter) procedurally (i.e. in a PLSQL BLOCK).

Query performance also depends on the HWM of your table (to be specific), you would almost always want to perform under the HWM of the table. Whilst TRUNCATE on the TRACKER table will help achieve this, previous data on it will be lost, so this could hardly be a solution here. Follow this AskTom link to learn how to find HWM.

Upvotes: 0

Gaurav Soni
Gaurav Soni

Reputation: 6336

DECLARE    

CURSOR foo_cur 
IS SELECT * FROM tableOne TO  
    WHERE EXISTS (SELECT * FROM tableTwo tt  
                   WHERE TO.FOO = TT.FOO  
                   AND TO.BAR = TT.BAR);  --THIS TAKES 5 MINUTES (66 MILLION ROWS)  

TYPE foo_nt IS TABLE OF tableOne%ROWTYPE;
v_foo_nt foo_nt;

 BEGIN

  OPEN foo_cur ;
  LOOP
  FETCH foo_cur BULK COLLECT INTO v_foo_nt LIMIT 1000;

       FORALL  i IN v_foo_nt.FIRST..v_foo_nt.LAST  
         INSERT INTO tracker t  
           (id,foo,bar,baz)  
          VALUES(trackerSequence.nextval, v_foo_nt(i).foo,v_foo_nt(i).bar,v_foo_nt(i).baz);  

EXIT WHEN foo_cur%NOTFOUND;  
END LOOP; 
  CLOSE foo_cur;  
  COMMIT;  
 END;  
END;

Upvotes: 1

Mike
Mike

Reputation: 3311

OK, I know you wanted the cursor...

The only real advantage to using the cursor is to commit every 10k? rows when processing that much data to avoid filling the logs up.

Unless you really need the cursor, Eliminate the row processing.

insert into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where 
t1.foo = t2.foo and
t1.bar = t2.bar;

Direct Path Insert hint as suggested

insert /*+ append */ into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where 
t1.foo = t2.foo and
t1.bar = t2.bar;

Upvotes: 3

Markus Mikkolainen
Markus Mikkolainen

Reputation: 3497

how about

INSERT INTO tracker t SELECT trackerSequence.nextVal
                            ,foo
                            ,bar
                            ,baz 
                      FROM tableOne to 
                            INNER JOIN tabletwo tt 
                         ON (to.foo = tt.foo and to.bar=tt.bar);

I wonder if that would be optimized better.

Also make sure the tracker -table indexes are disabled while inserting..

Upvotes: 6

YePhIcK
YePhIcK

Reputation: 5866

First - how can you optimize your PL/SQL's performance:

  • Disable indexes and any other constraints on target tables before you begin your load and re-enable them after you are done
  • Don't commit at the very end - have commit points to free-up rollback segments

Second - don't do the insert with PL/SQL. Use BulkLoading (as already suggested by some comments). you can easily find lots of info on BulkLoading if you Google for "oracle sql loader"

Upvotes: 0

Related Questions