Reputation: 4099
I have a temp-table which contains duplicate rows. I am trying to delete the duplicate rows from this table:
DELETE FROM #Payments
LEFT OUTER JOIN
(
SELECT
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), DocumentNo))) as RowId
,[ClearingDoc]
, [PaymentType]
, [DocDate]
FROM #Payments
GROUP BY [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
)
as KeepRows ON #Payments.RowId = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
;
But I keep getting an error Incorrect syntax near LEFT
. I'm probably just staring too long at it, but what am I doing wrong?
Upvotes: 1
Views: 6918
Reputation: 14915
I guess the main question is what is the natural key and what is the differentiator between records?
Let's do a simple T-SQL coding test to clear this up.
The code below creates a simple table using integers, a date and a character.
-- Use a default db
USE [model];
GO
-- Create temp table
CREATE TABLE #Payments
(
DocumentNo int,
DocDate date,
ClearingDoc int,
PaymentType char(1)
);
GO
-- Clear table
TRUNCATE TABLE #Payments
GO
-- Add data
INSERT INTO #Payments VALUES (1, dateadd(d, -5, getdate()), 1, 'A');
INSERT INTO #Payments VALUES (2, dateadd(d, -4, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (2, dateadd(d, -3, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (1, dateadd(d, -2, getdate()), 1, 'A');
GO
-- Show data
SELECT * FROM #Payments
GO
After executing the statement, our data looks like this.
The natural key is document no, clearing doc, and payment type. We want to look for the record with the oldest document date.
I like using a common table expression using an old fashion group by and having clause.
The following code returns the duplicate records with the oldest document date.
-- Find the oldest records
SELECT DocumentNo, ClearingDoc, PaymentType, MIN(DocDate) AS OldestDate
FROM #Payments
GROUP BY DocumentNo, ClearingDoc, PaymentType
HAVING COUNT(*) > 1
Last but not least, package this code inside a CTE/DELETE statement.
-- Remove duplicate data by oldest date
;
WITH CTE_DELETE_LIST AS
(
SELECT
DocumentNo,
ClearingDoc,
PaymentType,
MIN(DocDate) AS OldestDate
FROM
#Payments
GROUP BY
DocumentNo, ClearingDoc, PaymentType
HAVING
COUNT(*) > 1
)
DELETE
FROM #Payments
FROM #Payments AS P
JOIN
CTE_DELETE_LIST AS C
ON P.DocumentNo = C.DocumentNo and
P.ClearingDoc = C.ClearingDoc and
P.PaymentType = C.PaymentType and
P.DocDate = C.OldestDate
This is a snapshot of the table able deleting the two oldest rows.
Upvotes: 0
Reputation: 231
It will be less resource intesnive to select rows you want to keep into a new temp table. Something like this:
SELECT distinct [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
into #P2 from #Payments
Or you can add GROUP BY if not all your fields are unique.
Upvotes: 0
Reputation: 1269883
It is not clear what you are exactly trying to do. Your query is taking the minimum value of a column that is a group by
key.
But, if you wanted to keep the most recent row for each document, you can use row_number()
:
with todelete as (
select p.*,
row_number() over (partition by DocumentNo
order by DocDate desc
) as seqnum
from #payments p
)
delete from todelete
where seqnum > 1;
Upvotes: 3