Cyber
Cyber

Reputation: 13

Updating Millions of Records Oracle

I have created one query to update the 35 million records column, but unfortunately, it took around more than one hour to process. did I miss anything on the below query?

DECLARE
    CURSOR exp_cur IS
        SELECT
            DECODE(
              COLUMN_NAME,
              NULL, NULL,
              standard_hash(COLUMN_NAME)
            ) AS COLUMN_NAME
        FROM TABLE1;

    TYPE  nt_fName IS TABLE OF VARCHAR2(100);
    fname nt_fName;

BEGIN
    OPEN exp_cur;

    FETCH exp_cur   BULK COLLECT INTO fname  LIMIT 1000000;

    CLOSE exp_cur;

    --Print data
    FOR idx IN 1 .. fname.COUNT
    LOOP
        UPDATE TABLE1 SET COLUMN_NAME=fname(idx);

        commit;

        DBMS_OUTPUT.PUT_LINE (idx||' '||fname(idx) );

    END LOOP;
END;

Upvotes: 1

Views: 10070

Answers (2)

William Robertson
William Robertson

Reputation: 16001

The reason why bulk collect used with a forall construction is generally faster than the equivalent row-by-row loop is because it applies all the updates in one shot, instead of laboriously stepping though the rows one at a time and launching 35 million separate update statements, each one requiring the database to search for the individual row before updating it. But what you have written (even when the bugs are fixed) is still a row-by-row loop with 35 million search and update statements, plus the additional overhead of populating a 700 MB array in memory, 35 million commits, and 35 million dbms_output messages. It has to be slower because it has significantly more work to do than a plain update.

If it is practical to copy the data to a new table, insert will be a lot faster than update. At the end you can reapply any grants, indexes and constraints to the new table, rename both tables and drop the old one. You can also insert /*+ parallel enable_parallel_dml */ (or prior to Oracle 12c, you have to alter session enable parallel dml separately.) You could define the new table as nologging during the copy, but check with your DBA as that can affect replication and backups, though that might not matter if this is a test system. This will all need careful scripting if it's going to form part of a routine workflow.

Upvotes: 4

Popeye
Popeye

Reputation: 35930

Your code is updating all records of TABLE1 in each loop. (It loops 35 million times and in each loop updating 35 million records, That's why it is taking time)

You can simply use a single update statement as follows:

UPDATE TABLE1 SET COLUMN_NAME = standard_hash(COLUMN_NAME)
WHERE COLUMN_NAME IS NOT NULL;

So, If you want to use the BULK COLLECT and FORALL then you can use it as follows:

DECLARE
    CURSOR EXP_CUR IS
    SELECT COLUMN_NAME FROM TABLE1
     WHERE COLUMN_NAME IS NOT NULL;

    TYPE NT_FNAME IS TABLE OF VARCHAR2(100);
    FNAME NT_FNAME;
BEGIN
    OPEN EXP_CUR;
    FETCH EXP_CUR BULK COLLECT INTO FNAME LIMIT 1000000;
    FORALL IDX IN FNAME.FIRST..FNAME.LAST
        UPDATE TABLE1
           SET COLUMN_NAME = STANDARD_HASH(COLUMN_NAME)
         WHERE COLUMN_NAME = FNAME(IDX);

    COMMIT;
    CLOSE EXP_CUR;
END;
/

Upvotes: 3

Related Questions