Reputation: 21
I have 3 Databases ion Oracle, SQL Server and DB2.
Source DB: Oracle Target DB: SQL Server / DB2
On a daily basis we have to migrate Data from Oracle to SQL Server and Oracle to DB2. In C# application we are making use of BCP.
In SQL Server, the table has an unique index with IGNORE_DUP_KEY. So if we have duplicate data then it is simply ignored and all other data is copied successfully. https://techcommunity.microsoft.com/t5/sql-server/...
However, in DB2 if there is duplicate data, then entire transaction is rolled back. Is there any SQL Server equivalent option in DB2 or something we can do via C# BCP (https://www.ibm.com/support/producthub/db2/docs/co...)
I have come up with an alternative that is dump all the data from Oracle into a file and then use DB2 Import as it skips duplicate records and finishes the transaction. This is ok, but not a great solution as we have to write something specific in the application only for DB2.
Thanks
Upvotes: 0
Views: 575
Reputation: 19285
Constructs like MERGE
or WHERE NOT EXIST
are not safe under concurrent load. The check for if the row already exists and the actual insert are two separate actions in the database and the database doesn't lock the table. This means another process may insert the row before your own process and then you'll still get a dup-key-error. Your statement may work quite well for months .. until one day it doesn't. So beware!
The only solution that I know of which is guaranteed not to error on dup-key is to wrap your statement in Db2 PL SQL, like this:
BEGIN
DECLARE CONTINUE HANDLER FOR SQLSTATE '23505'
BEGIN -- empty body
END;
INSERT INTO t ...;
END;
The result will be that Db2 will attempt the insert, but if it fails with a dup-key-constraint-error (SQLSTATE=23505), then such error will be silently ignored because our HANDLER for this case has an empty body. I would expect that you can use the above statement in-lieu of the INSERT statement. C# probably doesn't know what statement it is executing as long as that statement is valid to the database.
If you are bulk-loading, and if the bulk-loading mechanism does not try to parallelize the work, then you might be okay with a solution based on MERGE
or WHERE NOT EXIST
. Which one is fastest? Dunno about Db2, but for some databases it has been proven that the try-catch method is actually faster than the test-if-already-exist-and-then-insert method in particular if the violation is a rare event. The try-catch method - as given above - can be said to be an optimistic strategy. We assume/hope that there are no duplicates, but if it turns out there are then we handle that. Optimistic strategies are generally always faster than their pessimistic counterparts.
YMMV.
Upvotes: 0
Reputation: 1270773
One method is to use not exists
:
insert into t ( . . . )
select . . .
from staging s
where not exists (select 1 from s.id = t.id);
You may need to do this for each unique constraint (and not exists . . .
).
If the new table itself could have duplicates, then add a row_number()
filter:
insert into t ( . . . )
select . . .
from (select s.*,
row_number() over (partition by id order by id) as seqnum
from staging s
) s
where seqnum = 1 and
not exists (select 1 from s.id = t.id);
Note that these are not thread-safe, but they should work for bulk loads.
Upvotes: 1