Reputation: 23187
Let's say I have a temp table of data I want to put into another table, but before doing so, I want to remove any rows in the temp table that already exist in the destination table.
I'm thinking of using a left outer join to identify which rows are duplicates, but then how can I go and delete these rows from my temp table?
Am I taking the right approach, or would it be better to remove duplicates after transferring the data?
Upvotes: 2
Views: 2812
Reputation:
Assuming there is an index on col1/col2, this may perform better than trying to weed out duplicates in the process of the insert:
DELETE t
FROM #temp AS t
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.table
WHERE col1 = t.col1
AND col2 = t.col2
);
An equivalent to Jack's approach would be:
INSERT dbo.table(col1, col2)
SELECT col1, col2
FROM #temp AS t
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.table
WHERE col1 = t.col1
AND col2 = t.col2
);
I'm going to assume you have a unique index on the #temp table so you don't need DISTINCT
or GROUP BY
. You can also delete duplicates from the #temp table as a separate step first, e.g.
;WITH t AS
(
SELECT col1, col2,
rn = ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1)
)
DELETE t WHERE rn > 1;
Whether you're doing the right thing or not, no idea. Will require a lot more information and testing to determine which approach is best. You're going to have to scan 2+ million rows for duplicates no matter which method you end up using.
Upvotes: 2