Reputation: 2014

Delete where one column contains duplicates

consider the below:

ProductID  Supplier
---------  --------
111        Microsoft
112        Microsoft
222        Apple Mac
222        Apple
223        Apple

In this example product 222 is repeated because the supplier is known as two names in the data supplied.

I have data like this for thousands of products. How can I delete the duplicate products or select individual results - something like a self join with SELECT TOP 1 or something like that?

Thanks!

Upvotes: 4

Answers (3)

Gordon Linoff

Reputation: 1269753

I think you want to do the following:

select t.*
from (select t.*,
             row_number() over (partition by product_id order by (select NULL)) as seqnum
      from t
     ) t
where seqnum = 1

This selects an arbitrary row for each product.

To delete all rows but one, you can use the same idea:

with todelete (
      (select t.*,
               row_number() over (partition by product_id order by (select NULL)) as seqnum
        from t
      )
delete from to_delete where seqnum > 1

Upvotes: 4

bobs

Reputation: 22184

;WITH Products_CTE AS
    (
    SELECT ProductID, Supplier,
        ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY <some value>) as rn
    FROM PRODUCTS
    )

SELECT *
FROM Products_CTE
WHERE rn = 1

The some value is going to be the key that determines which version of Supplier you keep. If you want the first instance of the supplier, you could use the DateAdded column, if it exists.

Upvotes: 1

John Woo

Reputation: 263723

DELETE  a
FROM    tableName a
        LEFT JOIN
        (
            SELECT  Supplier, MIN(ProductID) min_ID
            FROM    tableName
            GROUP   BY Supplier
        ) b ON  a.supplier = b.supplier AND
                a.ProductID = b.min_ID
WHERE   b.Supplier IS NULL

SQLFiddle Demo

or if you want to delete productID which has more than onbe product

WITH cte 
AS
(
    SELECT  ProductID, Supplier,
            ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Supplier) rn
    FROM    tableName
)
DELETE FROM cte WHERE rn > 1

SQLFiddle Demo

Upvotes: 4

Delete where one column contains duplicates

Answers (3)

Related Questions