sqldatabaseoracle-databaseoracle11gmigration

Reputation: 35

Oracle Data Migration [data modification] - Data Tuning

I'm facing with a data migration, my goal is update 2.5M of row in less than 8 hours, that's because the customer have a limited window of time where the service can be deactivated. Moreover the table can't be locked during this execution because is used by other procedures, I can lock the record only. The execution will done through batch process. Probably in this case migration isn't the correct word, could be better say "altering data"...

System: Oracle 11g

Table Informations

Table name: Tab1 Tot rows: 520.000.000 AVG row len: 57

DESC Tab1;
Name             Null?    Type
---------------- -------- -----------
t_id             NOT NULL NUMBER
t_fk1_id                  NUMBER
t_fk2_id                  NUMBER
t_start_date     NOT NULL DATE
t_end_date                DATE
t_del_flag       NOT NULL NUMBER(1)
t_flag1          NOT NULL NUMBER(1)
f_falg2          NOT NULL NUMBER(1)
t_creation_date           DATE
t_creation_user           NUMBER(10)
t_last_update             DATE
t_user_update             NUMBER(10)
t_flag3                   NUMBER(1)

Indexs are:

T_ID_PK [t_id] UNIQUE
T_IN_1 [t_fk2_id,t_fk1_id,t_start_date,t_del_flag] NONUNIQUE
T_IN_2 [t_last_update,t_fk2_id] NONUNIQUE
T_IN_3 [t_fk2_id,t_fk1_id] NONUNIQUE

Currently I've thinked some possible solutions and most of that I've already test:

Insert + delete: selecting the existing data, insert the new record with needed modification and delete the old one [this result as slowest method ~21h]
Merge: use the merge command for update the existing data [this result as the fastest method ~16h]
Update: update the existing data [~18h]

With the above solution I've faced some issues like: if executed wit /*+ parallel(x) / option the table was locked, the /+ RESULT_CACHE */ seem not affect at all the selection time. My last idea is partition the table by a new column and use that for avoid table locking and proceed with the solution 1.

Here the query used for Merge option (for the others two is the same more or less):

DECLARE
v_recordset NUMBER;
v_row_count NUMBER;
v_start_subset NUMBER;
v_tot_loops NUMBER;
BEGIN
--set the values manually for example purpose, I've use the same values
v_recordset := 10000;
v_tot_loops := 10000;
  BEGIN
    SELECT NVL(MIN(MOD(m_id,v_recordset)), 99999)
    INTO v_start_subset 
    FROM MIGRATION_TABLE
    WHERE m_status = 0; -- 0=not migrated , 1=migrated
  END;
  FOR v_n_subset IN v_start_subset..v_tot_loops
    LOOP
      BEGIN
        MERGE INTO Tab1 T1
        USING (
          SELECT m.m_new_id, c2.c_id, t.t_id
          FROM MIGRATION_TABLE m
            JOIN Tab1 t ON t.t_fk_id = m.m_old_id
            JOIN ChildTable c ON c.c_id = t.t_fk2_id
            JOIN ChildTable c2 ON c.c_name = c2.c_name --c_name is an UNIQUE index of ChildTable
          WHERE MOD(m.m_id,v_recordset) = v_n_subset 
            AND c.c_fk_id = old_product_id --value obtained from another subsystem
            AND c2.c_fk_id = new_product_id --value obtained from another subsystem
            AND t.t_del_flag = 0 --not deleted items
        ) T2
        ON (T1.t_id = T2.t_id)
        WHEN MATCHED THEN
          UPDATE T1.t_fk_id = T2.m_new_id, T1.t_fk2_id = T2.c_id, T1.t_last_update = trunc(sysdate)
        ;
        --Update the record as migrated and proceed
        COMMIT;
      EXCEPTION WHEN OTHERS THEN
        ROLLBACK;
      END;
  END LOOP;
END;

In the above script I've deleted the parallel and cache options but I've already test is with both and I've not obtained any bug result.

Anyone, please! Could you guys help me with this, in more than one week I wasn't able to reach the desired timing, any ideas?

MIGRATION_TABLE

CREATE TABLE MIGRATION_TABLE(
 m_customer_from VARCHAR2(5 BYTE),
 m_customer_to VARCHAR2(5 BYTE),
 m_old_id NUMBER(10,0) NOT NULL,
 m_new_id NUMBER(10,0) NOT NULL,
 m_status VARCHAR2(100 BYTE),
 CONSTRAINT M_MIG_PK_1
 (
  m_old_id 
 )
 ENABLE
)
CREATE UNIQUE INDEX M_MIG_PK_1 ON MIGRATION_TABLE (m_old_id ASC)

ChildTable

CREATE TABLE ChildTable(
 c_id NUMBER(10, 0) NOTE NULL,
 c_fk_id NUMBER(10, 0),
 c_name VARCHAR2(100 BYTE),
 c_date DATE,
 c_note VARCHAR2(100 BYTE),
 CONSTRAINT C_CT_PK_1
 (
  c_id
 )
 ENABLE
)
CREATE UNIQUE INDEX C_CT_PK_1 ON ChildTable (c_id ASC)
CREATE UNIQUE INDEX C_CT_PK_2 ON ChildTable (c_name ASC, c_fk_id ASC)

Upvotes: 1

Answers (3)

wolφi

Reputation: 8361

If method 1 and 2 are still too slow, you could follow your partitioning idea. For instance, introduce a column to distinguish the rows to be migrated. Because of DEFAULT ... NOT NULL this will be very fast:

ALTER TABLE Tab1 ADD (todo NUMBER DEFAULT 0 NOT NULL);

Now partition your table into two partions, one with the migration data, one with the rest that you will not touch. I don't have much experience with introducing partitions while the application is running, but I think it is solvable, for instance with online redefinition or

ALTER TABLE Tab1 MODIFY 
  PARTITION BY LIST (todo) (
  PARTITION pdonttouch VALUES (0),
  PARTITION pmigration VALUES (1) 
) ONLINE UPDATE INDEXES (
  T_ID_PK GLOBAL, T_IN_1  GLOBAL,
  T_IN_2  GLOBAL, T_IN_3  GLOBAL
);

Now you can identify the rows to be moved. This can be done row by row and doesn't affect the other processes and should not count towards your downtime. The migration rows will move from partition pdonttouch to partition pmigration therefore you need to enable row movement.

ALTER TABLE Tab1 ENABLE ROW MOVEMENT;
UPDATE Tab1 SET todo=1 WHERE .... JOIN ...;

Now you can work on the partition PMIGRATION and update the data there. This should be much faster than on the original table, as the size of the partition is only 0.5% of the whole table. Don't know about the indexes, though.

Theoretically, you could create a table with the same structure and data as PMIGRATION, work on the table, and once done, swap the partition and the working table with EXCHANGE PARTITION. Don't know about the indexes, again.

Upvotes: 0

wolφi

Reputation: 8361

Method 2 is similar to Method 1, but it is using ROWIDs instead of a primary key. In theory, it should be therefore a bit faster.

CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id, 
       t.rowid    AS rid,
       m.m_new_id AS new_fk1_id, 
       c2.c_id    AS new_fk2_id 
  FROM MIGRATION_TABLE m
  JOIN Tab1 t        ON t.t_fk1_id = m.m_old_id
  JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
  JOIN ChildTable c2 ON c1.c_name = c2.c_name
 WHERE t.t_del_flag = 0
 ORDER BY t.rowid;
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');

MERGE INTO Tab1 t USING migration_temp m ON (t.rowid = m.rid)
 WHEN MATCHED THEN UPDATE SET 
      t.t_fk1_id = m.new_fk1_id,
      t.t_fk2_id = m.new_fk2_id,
      t.t_last_update = trunc(sysdate);

You could think of batching the MERGE based on blocks of ROWIDs. Those tend to be logically colocated, therefore it should be a bit faster.

Upvotes: 1

wolφi

Reputation: 8361

Wow, 520 million rows! However, updating 2.5 million of them is only 0.5%, that should be doable. Not knowing your data, my first assumption is that the self join of Tab1 x Tab1 inside the MERGE takes up most of the time. Possibly also the many joins to migration- and child_tables. And the indexes T_IN_1, 2, and 3 need maintenance, too.

As you say the rows to be updated are fixed, I'd try to prepare the heavy work. This doesn't lock the table and wouldn't count towards the downtime:

CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id, 
       m.m_new_id AS new_fk1_id, 
       c2.c_id    AS new_fk2_id 
  FROM MIGRATION_TABLE m
  JOIN Tab1 t        ON t.t_fk1_id = m.m_old_id
  JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
  JOIN ChildTable c2 ON c1.c_name = c2.c_name
 WHERE t.t_del_flag = 0;

I omitted the bit with the old/new product_ids because I didn't fully understand how it should work, but that is hopefully not a problem.

Method 1 would be a join via primary keys:

ALTER TABLE migration_temp ADD CONSTRAINT pk_migration_temp PRIMARY KEY(t_id);
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');

MERGE INTO Tab1 t USING migration_temp m ON (t.t_id = m.t_id)
 WHEN MATCHED THEN UPDATE SET 
      t.t_fk1_id = m.new_fk1_id,
      t.t_fk2_id = m.new_fk2_id,
      t.t_last_update = trunc(sysdate);

I'm not a fan of batched updates. As you have time estimates, it looks like you have a test system. I'd suggest to give it a go and try it in one batch.

Upvotes: 0

Oracle Data Migration [data modification] - Data Tuning

Table Informations

Answers (3)

Related Questions