Reputation:

delete all but minimal values, based on two columns in SQL Server table

how to write a statement to accomplish the folowing?

lets say a table has 2 columns (both are nvarchar) with the following data

col1 10000_10000_10001_10002_10002_10002
col2 10____20____10____30____40_____50

I'd like to keep only the following data:

col1 10000_10001_10002
col2 10____10____30

thus removing the duplicates based on the second column values (neither of the columns are primary keys), keeping only those records with the minimal value in the second column.

how to accomplish this?

Upvotes: 2

Answers (3)

Anon246

Reputation: 1841

This should work for you:

;
WITH NotMin AS
(
    SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
    FROM Table1
)

DELETE Table1
--SELECT * 
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2 
    AND Table1.Col2 != TheMin

This uses a CTE (like a derived table, but cleaner) and the over clause as a shortcut for less code. I also added a commented select so you can see the matching rows (verify before deleting). This will work in SQL 2005/2008.

Thanks, Eric

Upvotes: 4

Jason Francis

Reputation: 1104

Sorry, I misunderstood the question.


SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1

Of course returns the rows in question, but assuming you can't alter the table to add a unique identifier, you would need to do something like:


DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)

Which should work assuming that the pipe character never appears in your set.

Upvotes: 0

Cade Roux

Reputation: 89721

Ideally, you'd like to be able to say:

DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)

Unfortunately, that's not allowed in T-SQL, but there is a proprietary extension with a double FROM (using EXCEPT for clarity):

DELETE
FROM tbl
FROM tbl
EXCEPT
    SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1

In general:

DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)

Or other workarounds.

Upvotes: 0

delete all but minimal values, based on two columns in SQL Server table

Answers (3)

Related Questions