Reputation: 35
I'm facing with a data migration, my goal is update 2.5M of row in less than 8 hours, that's because the customer have a limited window of time where the service can be deactivated. Moreover the table can't be locked during this execution because is used by other procedures, I can lock the record only. The execution will done through batch process. Probably in this case migration isn't the correct word, could be better say "altering data"...
System: Oracle 11g
Table name: Tab1 Tot rows: 520.000.000 AVG row len: 57
DESC Tab1;
Name Null? Type
---------------- -------- -----------
t_id NOT NULL NUMBER
t_fk1_id NUMBER
t_fk2_id NUMBER
t_start_date NOT NULL DATE
t_end_date DATE
t_del_flag NOT NULL NUMBER(1)
t_flag1 NOT NULL NUMBER(1)
f_falg2 NOT NULL NUMBER(1)
t_creation_date DATE
t_creation_user NUMBER(10)
t_last_update DATE
t_user_update NUMBER(10)
t_flag3 NUMBER(1)
Indexs are:
T_ID_PK [t_id] UNIQUE
T_IN_1 [t_fk2_id,t_fk1_id,t_start_date,t_del_flag] NONUNIQUE
T_IN_2 [t_last_update,t_fk2_id] NONUNIQUE
T_IN_3 [t_fk2_id,t_fk1_id] NONUNIQUE
Currently I've thinked some possible solutions and most of that I've already test:
With the above solution I've faced some issues like: if executed wit /*+ parallel(x) / option the table was locked, the /+ RESULT_CACHE */ seem not affect at all the selection time. My last idea is partition the table by a new column and use that for avoid table locking and proceed with the solution 1.
Here the query used for Merge option (for the others two is the same more or less):
DECLARE
v_recordset NUMBER;
v_row_count NUMBER;
v_start_subset NUMBER;
v_tot_loops NUMBER;
BEGIN
--set the values manually for example purpose, I've use the same values
v_recordset := 10000;
v_tot_loops := 10000;
BEGIN
SELECT NVL(MIN(MOD(m_id,v_recordset)), 99999)
INTO v_start_subset
FROM MIGRATION_TABLE
WHERE m_status = 0; -- 0=not migrated , 1=migrated
END;
FOR v_n_subset IN v_start_subset..v_tot_loops
LOOP
BEGIN
MERGE INTO Tab1 T1
USING (
SELECT m.m_new_id, c2.c_id, t.t_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk_id = m.m_old_id
JOIN ChildTable c ON c.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c.c_name = c2.c_name --c_name is an UNIQUE index of ChildTable
WHERE MOD(m.m_id,v_recordset) = v_n_subset
AND c.c_fk_id = old_product_id --value obtained from another subsystem
AND c2.c_fk_id = new_product_id --value obtained from another subsystem
AND t.t_del_flag = 0 --not deleted items
) T2
ON (T1.t_id = T2.t_id)
WHEN MATCHED THEN
UPDATE T1.t_fk_id = T2.m_new_id, T1.t_fk2_id = T2.c_id, T1.t_last_update = trunc(sysdate)
;
--Update the record as migrated and proceed
COMMIT;
EXCEPTION WHEN OTHERS THEN
ROLLBACK;
END;
END LOOP;
END;
In the above script I've deleted the parallel and cache options but I've already test is with both and I've not obtained any bug result.
Anyone, please! Could you guys help me with this, in more than one week I wasn't able to reach the desired timing, any ideas?
MIGRATION_TABLE
CREATE TABLE MIGRATION_TABLE(
m_customer_from VARCHAR2(5 BYTE),
m_customer_to VARCHAR2(5 BYTE),
m_old_id NUMBER(10,0) NOT NULL,
m_new_id NUMBER(10,0) NOT NULL,
m_status VARCHAR2(100 BYTE),
CONSTRAINT M_MIG_PK_1
(
m_old_id
)
ENABLE
)
CREATE UNIQUE INDEX M_MIG_PK_1 ON MIGRATION_TABLE (m_old_id ASC)
ChildTable
CREATE TABLE ChildTable(
c_id NUMBER(10, 0) NOTE NULL,
c_fk_id NUMBER(10, 0),
c_name VARCHAR2(100 BYTE),
c_date DATE,
c_note VARCHAR2(100 BYTE),
CONSTRAINT C_CT_PK_1
(
c_id
)
ENABLE
)
CREATE UNIQUE INDEX C_CT_PK_1 ON ChildTable (c_id ASC)
CREATE UNIQUE INDEX C_CT_PK_2 ON ChildTable (c_name ASC, c_fk_id ASC)
Upvotes: 1
Views: 79
Reputation: 8361
If method 1 and 2 are still too slow, you could follow your partitioning idea. For instance, introduce a column to distinguish the rows to be migrated. Because of DEFAULT ... NOT NULL
this will be very fast:
ALTER TABLE Tab1 ADD (todo NUMBER DEFAULT 0 NOT NULL);
Now partition your table into two partions, one with the migration data, one with the rest that you will not touch. I don't have much experience with introducing partitions while the application is running, but I think it is solvable, for instance with online redefinition or
ALTER TABLE Tab1 MODIFY
PARTITION BY LIST (todo) (
PARTITION pdonttouch VALUES (0),
PARTITION pmigration VALUES (1)
) ONLINE UPDATE INDEXES (
T_ID_PK GLOBAL, T_IN_1 GLOBAL,
T_IN_2 GLOBAL, T_IN_3 GLOBAL
);
Now you can identify the rows to be moved. This can be done row by row and doesn't affect the other processes and should not count towards your downtime. The migration rows will move from partition pdonttouch
to partition pmigration
therefore you need to enable row movement.
ALTER TABLE Tab1 ENABLE ROW MOVEMENT;
UPDATE Tab1 SET todo=1 WHERE .... JOIN ...;
Now you can work on the partition PMIGRATION
and update the data there. This should be much faster than on the original table, as the size of the partition is only 0.5% of the whole table. Don't know about the indexes, though.
Theoretically, you could create a table with the same structure and data as PMIGRATION
, work on the table, and once done, swap the partition and the working table with EXCHANGE PARTITION
. Don't know about the indexes, again.
Upvotes: 0
Reputation: 8361
Method 2 is similar to Method 1, but it is using ROWIDs instead of a primary key. In theory, it should be therefore a bit faster.
CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id,
t.rowid AS rid,
m.m_new_id AS new_fk1_id,
c2.c_id AS new_fk2_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk1_id = m.m_old_id
JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c1.c_name = c2.c_name
WHERE t.t_del_flag = 0
ORDER BY t.rowid;
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');
MERGE INTO Tab1 t USING migration_temp m ON (t.rowid = m.rid)
WHEN MATCHED THEN UPDATE SET
t.t_fk1_id = m.new_fk1_id,
t.t_fk2_id = m.new_fk2_id,
t.t_last_update = trunc(sysdate);
You could think of batching the MERGE based on blocks of ROWIDs. Those tend to be logically colocated, therefore it should be a bit faster.
Upvotes: 1
Reputation: 8361
Wow, 520 million rows! However, updating 2.5 million of them is only 0.5%, that should be doable. Not knowing your data, my first assumption is that the self join of Tab1 x Tab1 inside the MERGE takes up most of the time. Possibly also the many joins to migration- and child_tables. And the indexes T_IN_1, 2, and 3 need maintenance, too.
As you say the rows to be updated are fixed, I'd try to prepare the heavy work. This doesn't lock the table and wouldn't count towards the downtime:
CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id,
m.m_new_id AS new_fk1_id,
c2.c_id AS new_fk2_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk1_id = m.m_old_id
JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c1.c_name = c2.c_name
WHERE t.t_del_flag = 0;
I omitted the bit with the old/new product_ids because I didn't fully understand how it should work, but that is hopefully not a problem.
Method 1 would be a join via primary keys:
ALTER TABLE migration_temp ADD CONSTRAINT pk_migration_temp PRIMARY KEY(t_id);
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');
MERGE INTO Tab1 t USING migration_temp m ON (t.t_id = m.t_id)
WHEN MATCHED THEN UPDATE SET
t.t_fk1_id = m.new_fk1_id,
t.t_fk2_id = m.new_fk2_id,
t.t_last_update = trunc(sysdate);
I'm not a fan of batched updates. As you have time estimates, it looks like you have a test system. I'd suggest to give it a go and try it in one batch.
Upvotes: 0