Reputation: 983
I'm feeding data into SQL Server database and 1 out of every 1000 records is a duplicate due to matters outside my control. It's an exact duplicate - the entire record, the unique identifier -- everything.
I know this can solved with an 'updated' rather than insert step ... or 'on error, update' instead of insert, perhaps.
But is there a quick and easy way to make SQL Server ignore these duplicates? I haven't made an index/ unique constraint yet -- but if I did that, I don't want a 'duplicate' key value breaking or interrupting the ETL/ data flow process. I just SQL Server to keep executing the insert query. Is there a way to do this?
Upvotes: 0
Views: 1511
Reputation: 59
Just to be clear for anyone else hitting this issue, for the best performance and a slight chance of losing an insert, one should define primary key in the table and use IGNORE_DUP_KEY = ON.
Upvotes: 1
Reputation: 6713
If you're looking for a duplicate record on every field just use the distinct clause in your select:
Insert into DestinationTable
Select Distinct *
From SourceTable
EDIT:
I misinterpreted your question. You're trying to find a low impact way to prevent adding a record that already exists in your DestinationTable
.
If you want your inserts to remain fast, one way to do it is to add an identity column to your table as the primary key. Let your duplicate records get added, but then run a maintenance routine on down or slow time that checks all records added since the last check and deletes any added duplicates. Otherwise, there is no easy way... you will have to check on every insert.
Upvotes: 0
Reputation: 432
Just add a WHERE NOT EXISTS to the statement you're executing -
INSERT INTO table VALUES('123', 'blah') WHERE NOT EXISTS(select top 1 from table where unique_identifier_column = '123')
Upvotes: 3