Reputation:
how to write a statement to accomplish the folowing?
lets say a table has 2 columns (both are nvarchar) with the following data
col1 10000_10000_10001_10002_10002_10002
col2 10____20____10____30____40_____50
I'd like to keep only the following data:
col1 10000_10001_10002
col2 10____10____30
thus removing the duplicates based on the second column values (neither of the columns are primary keys), keeping only those records with the minimal value in the second column.
how to accomplish this?
Upvotes: 2
Views: 1151
Reputation: 1841
This should work for you:
;
WITH NotMin AS
(
SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
FROM Table1
)
DELETE Table1
--SELECT *
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2
AND Table1.Col2 != TheMin
This uses a CTE (like a derived table, but cleaner) and the over clause as a shortcut for less code. I also added a commented select so you can see the matching rows (verify before deleting). This will work in SQL 2005/2008.
Thanks, Eric
Upvotes: 4
Reputation: 1104
Sorry, I misunderstood the question.
SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1
Of course returns the rows in question, but assuming you can't alter the table to add a unique identifier, you would need to do something like:
DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)
Which should work assuming that the pipe character never appears in your set.
Upvotes: 0
Reputation: 89721
Ideally, you'd like to be able to say:
DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)
Unfortunately, that's not allowed in T-SQL, but there is a proprietary extension with a double FROM (using EXCEPT for clarity):
DELETE
FROM tbl
FROM tbl
EXCEPT
SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1
In general:
DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)
Or other workarounds.
Upvotes: 0