MrM
MrM

Reputation: 399

Optimize bulk update with parallelization

I have a procedure used to scramble transactional data moving live data to a test environment. The table in question holds approx. 100 million rows spread across 50 partitions. A new partition is added every month. As the volume increase, procedure is executing slower than before.

I am looking into introducing some degree of parallelization into my code. This is new territory and I am wondering if there are any best practises. Perhaps use dbms_parallel_execute to split update into chunks?

Any recommendations on how to optimize my code is very much appreciated!

PROCEDURE Scramble_Transactions
AS
    vSeed              BINARY_INTEGER;

    CURSOR Transactions_cur
    IS
        SELECT T.ID,
               T.MONTH_PARTITION,
               T.TRACE_NUM,
               T.TXTDATA
          FROM TRANSACTIONS T;

    TYPE TBL IS TABLE OF Transactions_cur%ROWTYPE
        INDEX BY PLS_INTEGER;

    Transactions_Rec   TBL;

    vCounter           NUMBER (10);
    vString            VARCHAR2 (300);
    vLen               NUMBER (5);
    vFromRange         VARCHAR2 (25);
    vToRange           VARCHAR2 (25);
BEGIN
    vCounter := 0;

    SELECT SUBSTR (TO_CHAR (SYSDATE, 'ddmmyyyyhhmiss'), 11)
      INTO vSeed
      FROM DUAL;

    DBMS_RANDOM.initialize (vSeed);
    DBMS_RANDOM.SEED (vSeed);
    vFromRange := 0;

    OPEN Transactions_cur;

    LOOP
        FETCH Transactions_cur BULK COLLECT INTO Transactions_Rec LIMIT 10000;

        FOR I IN 1 .. Transactions_Rec.COUNT
        LOOP
            IF Transactions_Rec (i).TRACE_NUM IS NOT NULL
            THEN
                vString := Transactions_Rec (i).TRACE_NUM;
                vLen := LENGTH (TRIM (vString));
                vToRange := POWER (10, vLen) - 1;
                Transactions_Rec (i).TRACE_NUM :=
                    LPAD (TRUNC (DBMS_RANDOM.VALUE (vFromRange, vToRange)),
                          6,
                          '1');
            END IF;

            IF Transactions_Rec (i).TXTDATA IS NOT NULL
            THEN
                vString := Transactions_Rec (i).TXTDATA;
                vLen := LENGTH (TRIM (vString));
                vToRange := POWER (10, vLen) - 1;
                Transactions_Rec (i).TXTDATA :=
                    LPAD (TRUNC (DBMS_RANDOM.VALUE (vFromRange, vToRange)),
                          12,
                          '3');
            END IF;

            vCounter := vCounter + 1;
        END LOOP;

        FORALL rec IN 1 .. Transactions_Rec.COUNT
            UPDATE Transactions
               SET TRACE_NUM = Transactions_Rec (rec).TRACE_NUM,
                   TXTDATA = Transactions_Rec (rec).TXTDATA
             WHERE ID = Transactions_Rec (rec).ID
               AND MONTH_PARTITION = Transactions_Rec (rec).MONTH_PARTITION;

        EXIT WHEN Transactions_cur%NOTFOUND;
    END LOOP;

    DBMS_RANDOM.TERMINATE;

    CLOSE Transactions_cur;

    COMMIT;
END Scramble_Transactions;

Edit, my solution based on below feedback: Rewrite part of the procedure so that datascrambling is completed as part of SQL instead of PL/SQL. Procedure now also takes partition from/to as parameters allowing parallel processing.

CREATE OR REPLACE PROCEDURE Scramble_Transactions(P_MONTH_PARTITION_FROM VARCHAR2, P_MONTH_PARTITION_FROM VARCHAR2)
AS

CURSOR Transactions_cur (V_MONTH_PARTITION_FROM TRANSACTIONS.MONTH_PARTITION%TYPE, 
V_MONTH_PARTITION_TO TRANSACTIONS.MONTH_PARTITION%TYPE) IS

  SELECT T.ID,
               T.MONTH_PARTITION,
               REGEXP_REPLACE(T.TRACE_NUM,'[0-9]','9') TRACE_NUM,
               REGEXP_REPLACE(T.TXTDATA,'[0-9]','9') TXTDATA
          FROM TRANSACTIONS T WHERE T.MONTH_PARTITION BETWEEN P_MONTH_PARTITION_FROM AND P_MONTH_PARTITION_FROM ;

    TYPE TBL IS TABLE OF Transactions_cur%ROWTYPE
        INDEX BY PLS_INTEGER;

    Transactions_Rec   TBL;

BEGIN
OPEN Transactions_cur(P_MONTH_PARTITION_FROM,P_MONTH_PARTITION_FROM);
LOOP
   FETCH Transactions_cur BULK COLLECT INTO Transactions_Rec LIMIT 10000;

       /*Some additional processing*/

       FORALL rec IN 1 .. Transactions_Rec.COUNT
            UPDATE Transactions
               SET TRACE_NUM = Transactions_Rec (rec).TRACE_NUM,
                   TXTDATA = Transactions_Rec (rec).TXTDATA
             WHERE ID = Transactions_Rec (rec).ID
               AND MONTH_PARTITION = Transactions_Rec (rec).MONTH_PARTITION;

  EXIT WHEN  Transactions_cur%NOTFOUND;
END LOOP;
CLOSE Transactions_cur;
COMMIT;
END;
/

Now execute the procedure in parallell via use of DBMS_PARALLEL_EXECUTE. Query is split into chunks based on partitionkey.

DECLARE
  L_TASK_SQL CLOB;
  V_TASKNAME USER_PARALLEL_EXECUTE_TASKS.TASK_NAME%TYPE;
  V_STATUS   USER_PARALLEL_EXECUTE_TASKS.STATUS%TYPE;
  C_TASK_NAME VARCHAR2(50) := 'TRANSACTIONS_TASK';
BEGIN
  L_TASK_SQL := 'SELECT PARTITION_NAME, PARTITION_NAME FROM USER_TAB_PARTITIONS WHERE TABLE_NAME = ''TRANSACTIONS''';
  DBMS_PARALLEL_EXECUTE.CREATE_TASK(C_TASK_NAME);
  DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(
        TASK_NAME => 'TRANSACTIONS_TASK',
        SQL_STMT  => L_TASK_SQL,
        BY_ROWID  => FALSE);
  DBMS_PARALLEL_EXECUTE.RUN_TASK(
        TASK_NAME      => C_TASK_NAME,
        SQL_STMT => 'BEGIN SCRAMBLE_TRANSACTIONS( :START_ID, :END_ID ); END;',
        LANGUAGE_FLAG  => DBMS_SQL.NATIVE,
        PARALLEL_LEVEL => 6);

  SELECT TASK_NAME, STATUS INTO V_TASKNAME,V_STATUS FROM USER_PARALLEL_EXECUTE_TASKS WHERE TASK_NAME = C_TASK_NAME; 
  DBMS_OUTPUT.PUT_LINE('TASK:'|| 'V_TASKNAME' ||' , STATUS:'|| V_STATUS);

  DBMS_PARALLEL_EXECUTE.DROP_CHUNKS(TASK_NAME => 'TRANSACTIONS_TASK');
  DBMS_PARALLEL_EXECUTE.DROP_TASK(TASK_NAME  => 'TRANSACTIONS_TASK');
END;
/

Overall total execution is lowered to 30 minutes compared to 13-14 hours previous.

Upvotes: 0

Views: 1186

Answers (2)

BobC
BobC

Reputation: 4424

I think you would be much better off in terms if performance to use a CTAS (create table... as select), or insert /+* append*/... rather than in update. Since your data is partitioned, then you can employ partition exchange. This would allow you to use parallelism much more effectively, together with direct path load operations.

Upvotes: 0

Connor McDonald
Connor McDonald

Reputation: 11591

SQL is a good option, but perhaps one very quick fix is that you're updating the same table you are fetching from. That can create huge undo issues because the fetch must give a resultset consistent to a point in time. So each time around the fetch-loop, you might be doing more and more work (undo-ing the updates you've just done). Of course, committing each loop then creates the issue of restartability on error. So maybe do it partition at a time, do it without looping, eg

PROCEDURE Scramble_Transactions(p_parname varchar2) AS
    vSeed              BINARY_INTEGER;


    Transactions_cur sys_refcursor;

    CURSOR Transactions_cur_template
    IS
        SELECT T.ID,
               T.MONTH_PARTITION,
               T.TRACE_NUM,
               T.TXTDATA
          FROM TRANSACTIONS T;

    TYPE TBL IS TABLE OF Transactions_cur_template%ROWTYPE INDEX BY PLS_INTEGER;

    Transactions_Rec   TBL;

    vCounter           NUMBER (10);
    vString            VARCHAR2 (300);
    vLen               NUMBER (5);
    vFromRange         VARCHAR2 (25);
    vToRange           VARCHAR2 (25);
BEGIN
    vCounter := 0;

    SELECT SUBSTR (TO_CHAR (SYSDATE, 'ddmmyyyyhhmiss'), 11)
      INTO vSeed
      FROM DUAL;

    DBMS_RANDOM.initialize (vSeed);
    DBMS_RANDOM.SEED (vSeed);
    vFromRange := 0;

    OPEN Transactions_cur for ' SELECT T.ID,
               T.MONTH_PARTITION,
               T.TRACE_NUM,
               T.TXTDATA
          FROM TRANSACTIONS T partition ('||p_parname||') where TRACE_NUM IS NOT NULL or TXTDATA IS NOT NULL';

        FETCH Transactions_cur BULK COLLECT INTO Transactions_Rec;

        FOR I IN 1 .. Transactions_Rec.COUNT
        LOOP
            IF Transactions_Rec (i).TRACE_NUM IS NOT NULL
            THEN
                vString := Transactions_Rec (i).TRACE_NUM;
                vLen := LENGTH (TRIM (vString));
                vToRange := POWER (10, vLen) - 1;
                Transactions_Rec (i).TRACE_NUM :=
                    LPAD (TRUNC (DBMS_RANDOM.VALUE (vFromRange, vToRange)),
                          6,
                          '1');
            END IF;

            IF Transactions_Rec (i).TXTDATA IS NOT NULL
            THEN
                vString := Transactions_Rec (i).TXTDATA;
                vLen := LENGTH (TRIM (vString));
                vToRange := POWER (10, vLen) - 1;
                Transactions_Rec (i).TXTDATA :=
                    LPAD (TRUNC (DBMS_RANDOM.VALUE (vFromRange, vToRange)),
                          12,
                          '3');
            END IF;

            vCounter := vCounter + 1;
        END LOOP;

        FORALL rec IN 1 .. Transactions_Rec.COUNT
            UPDATE Transactions
               SET TRACE_NUM = Transactions_Rec (rec).TRACE_NUM,
                   TXTDATA = Transactions_Rec (rec).TXTDATA
             WHERE ID = Transactions_Rec (rec).ID
               AND MONTH_PARTITION = Transactions_Rec (rec).MONTH_PARTITION;

    DBMS_RANDOM.TERMINATE;

    CLOSE Transactions_cur;

    COMMIT;
END Scramble_Transactions;

So with just a few lines of code changes, we've

  • eliminated the fetch doing lots of undo issue
  • made it easily run in parallel by taking partition name as a parameter

You could then submit a job (using say DBMS_SCHEDULER) for each partition name, and because we are now isolating per partition, we won't get contention across the jobs.

Don't get me wrong - a full refactoring in SQL is perhaps still the best option, but in terms of quick wins, the code above might solve your issue with minimal changes done.

Upvotes: 2

Related Questions