user45867
user45867

Reputation: 983

SQL Server - simple discard of duplicate keys/ rows when inserting

I'm feeding data into SQL Server database and 1 out of every 1000 records is a duplicate due to matters outside my control. It's an exact duplicate - the entire record, the unique identifier -- everything.

I know this can solved with an 'updated' rather than insert step ... or 'on error, update' instead of insert, perhaps.

But is there a quick and easy way to make SQL Server ignore these duplicates? I haven't made an index/ unique constraint yet -- but if I did that, I don't want a 'duplicate' key value breaking or interrupting the ETL/ data flow process. I just SQL Server to keep executing the insert query. Is there a way to do this?

Upvotes: 0

Views: 1511

Answers (3)

Matt Jones MSFT
Matt Jones MSFT

Reputation: 59

Just to be clear for anyone else hitting this issue, for the best performance and a slight chance of losing an insert, one should define primary key in the table and use IGNORE_DUP_KEY = ON.

Upvotes: 1

Brian Pressler
Brian Pressler

Reputation: 6713

If you're looking for a duplicate record on every field just use the distinct clause in your select:

Insert into DestinationTable
Select Distinct *
From SourceTable

EDIT: I misinterpreted your question. You're trying to find a low impact way to prevent adding a record that already exists in your DestinationTable.

If you want your inserts to remain fast, one way to do it is to add an identity column to your table as the primary key. Let your duplicate records get added, but then run a maintenance routine on down or slow time that checks all records added since the last check and deletes any added duplicates. Otherwise, there is no easy way... you will have to check on every insert.

Upvotes: 0

James
James

Reputation: 432

Just add a WHERE NOT EXISTS to the statement you're executing -

INSERT INTO table VALUES('123', 'blah') WHERE NOT EXISTS(select top 1 from table where unique_identifier_column = '123')

Upvotes: 3

Related Questions