Pr0no
Pr0no

Reputation: 4099

Deleting duplicate rows from temp-table

I have a temp-table which contains duplicate rows. I am trying to delete the duplicate rows from this table:

DELETE FROM #Payments 
LEFT OUTER JOIN 
    (
    SELECT 
        CONVERT(uniqueidentifier, MIN(CONVERT(char(36), DocumentNo))) as RowId
       ,[ClearingDoc]
       , [PaymentType]
       , [DocDate]
    FROM #Payments 
    GROUP BY [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
) 
as KeepRows ON #Payments.RowId = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
;

But I keep getting an error Incorrect syntax near LEFT. I'm probably just staring too long at it, but what am I doing wrong?

Upvotes: 1

Views: 6918

Answers (3)

CRAFTY DBA
CRAFTY DBA

Reputation: 14915

I guess the main question is what is the natural key and what is the differentiator between records?

Let's do a simple T-SQL coding test to clear this up.

The code below creates a simple table using integers, a date and a character.

-- Use a default db
USE [model];
GO

-- Create temp table
CREATE TABLE #Payments
(
DocumentNo int,
DocDate date,
ClearingDoc int,
PaymentType char(1)
);
GO

-- Clear table
TRUNCATE TABLE #Payments
GO

-- Add data
INSERT INTO #Payments VALUES (1, dateadd(d, -5, getdate()), 1, 'A');
INSERT INTO #Payments VALUES (2, dateadd(d, -4, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (2, dateadd(d, -3, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (1, dateadd(d, -2, getdate()), 1, 'A');
GO

-- Show data
SELECT * FROM #Payments
GO

After executing the statement, our data looks like this.

The natural key is document no, clearing doc, and payment type. We want to look for the record with the oldest document date.

enter image description here

I like using a common table expression using an old fashion group by and having clause.

The following code returns the duplicate records with the oldest document date.

-- Find the oldest records
SELECT DocumentNo, ClearingDoc, PaymentType, MIN(DocDate) AS OldestDate
FROM #Payments
GROUP BY DocumentNo, ClearingDoc, PaymentType
HAVING COUNT(*) > 1

enter image description here

Last but not least, package this code inside a CTE/DELETE statement.

-- Remove duplicate data by oldest date
;
WITH CTE_DELETE_LIST AS
(
SELECT 
    DocumentNo, 
    ClearingDoc, 
    PaymentType, 
    MIN(DocDate) AS OldestDate
FROM 
    #Payments
GROUP BY 
    DocumentNo, ClearingDoc, PaymentType
HAVING 
    COUNT(*) > 1
)
DELETE
FROM #Payments
FROM #Payments AS P
JOIN 
    CTE_DELETE_LIST AS C
ON P.DocumentNo = C.DocumentNo and 
   P.ClearingDoc = C.ClearingDoc and
   P.PaymentType = C.PaymentType and 
   P.DocDate = C.OldestDate

This is a snapshot of the table able deleting the two oldest rows.

enter image description here

Upvotes: 0

VictorS
VictorS

Reputation: 231

It will be less resource intesnive to select rows you want to keep into a new temp table. Something like this:

  SELECT distinct [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
  into #P2 from #Payments 

Or you can add GROUP BY if not all your fields are unique.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269883

It is not clear what you are exactly trying to do. Your query is taking the minimum value of a column that is a group by key.

But, if you wanted to keep the most recent row for each document, you can use row_number():

with todelete as (
      select p.*,
             row_number() over (partition by DocumentNo
                                order by DocDate desc
                               ) as seqnum
      from #payments p
     )
delete from todelete
    where seqnum > 1;

Upvotes: 3

Related Questions