TSCAmerica.com
TSCAmerica.com

Reputation: 5377

T- SQL Duplicate Records

I am trying to delete every other record which are duplicate my select query returns every other record duplicate (tblPoints.ptUser_ID) is the unique id

SELECT *, u.usMembershipID
  FROM [ABCRewards].[dbo].[tblPoints]
  inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
  where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
  and ptCreateDate >= '3/9/2016'
  and ptDesc = 'December Anniversary'

Upvotes: 0

Views: 74

Answers (2)

AXMIM
AXMIM

Reputation: 2472

When cleaning up data duplication, I have always used the same query pattern to delete all the duplicate and keep the wanted one(original, most recent, whatever). The below query pattern delete all duplicates and keep the one you wish to keep.

Just replace all [] with your table and fields.

  • [Field(s)ToDetectDuplications] : Put here the field(s) that allow you to say that they are dupplicate when they have the same values.

  • [Field(s)ToChooseWhichDupplicationIsKept ] : Put here a fields to choose which dupplicate will be kept. For exemple, the one with the biggest value or the less old one.

.

DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

I recommend to have a look to what will be deleted before. To do so, just replace the "delete" statement with a "select" instead just like below.

SELECT  T.I,
        [YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

Explanation :
Here we use "row_number()", "Partition by" and "Order by" to detect duplicates. "Partition" group together all rows. Set your partitions fields in order to have one row per partition when the data is right. That way bad data come out with partition that have more than one row. Row_number assign them a number. When a number is greater then 1, then this mean there is a duplicate with this partition. The "order by" is use to tell "row_number" in what order to assign them a number. Number 1 is kept, all others are deleted.

Exemple with OP's schema and specification
Here I attempted to fill the patern with guess I have made on your database schema.

DECLARE @userID INT
SELECT @userID = 8

SELECT  T.I,
        [ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID  ORDER BY ptCreateDate DESC)
            FROM [ABCRewards].[dbo].[tblPoints]
            WHERE T.ptCreateDate >= '3/9/2016'
            AND T.ptDesc = 'December Anniversary'
            AND T.ptUser_ID = @userID
            ) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

Upvotes: 1

Fuzzy
Fuzzy

Reputation: 3810

Usually duplicates getting returned by an INNER JOIN suggests an issue with the query but if you are certain that your join is correct then this would do it:

;WITH CTE
     AS (SELECT *
              , ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
         FROM [ABCRewards].[dbo].[tblPoints] AS t)

/*Uncomment below to Review duplicates*/
     --SELECT *
     --FROM CTE
     --WHERE rn > 1;

/*Uncomment below to Delete duplicates*/
    --DELETE 
    --FROM CTE
    --WHERE rn > 1;

Upvotes: 1

Related Questions