Reputation: 29
I have a sql database, "persons', that has duplicate entries based on the IDNum column. I need to query the entries and only show the rows or duplicate entries based on the newest creation date. Here is the query:
SELECT IDNum
,PersonPGUID
,CreatedDateTime
,FirstName
,MiddleName
,LastName
FROM [Persons]
WHERE IDNum in (298928, 509520, 528203);
It returns this:
IDNum PersonPGUID CreatedDateTime FirstName MiddleName LastName
298928 C43DEB67-EB04-4066-A374-B8369D508CF6 2007-10-22 10:13:45.000 Jonathan McKinley Bennett-Tisdell
298928 75B63500-7C67-4B54-A2C1-F74EAF98B861 2007-10-22 10:15:35.000 Jonathan McKinley Bennett-Tisdell
509520 57A1FB38-93B5-4D7B-8A17-243EC9330766 2005-02-10 11:55:13.000 Lannon Caleb Morgan
509520 176667F4-318E-415B-AFFC-A80348325A9D 2004-08-24 06:26:27.000 Lannon Caleb Morgan
528203 36C372CB-C9AD-4CEC-8553-8147C7FEDE20 2009-06-08 09:26:43.000 Pedro Trigueros
528203 5B71C0D5-10EB-4375-8F80-E8F01381E08A 2011-12-15 10:28:11.000 Pedro Trigueros
I need to filter to only show on of the duplicates and it need to be the record with the MAX date for each set of duplicates.
Thanks, William
Upvotes: 2
Views: 10968
Reputation: 624
If CreatedDateTime is an entirely unique column throughout (i.e. no two values duplicate):
SELECT * FROM PERSONS WHERE CreatedDateTime IN
(SELECT MAX(CreatedDateTime) from Persons group by IDnum having count(IDnum) > 1)
Hi William,
Thanks for the 10 points, I just thought to myself that the table may also contain rows where the IdNum value is NOT duplicated. Your question shows 3 duplicate IdNum's so I don't know whether unduplicated IdNums exist within it but I assume maybe so. If so you could use this:
SELECT * FROM PERSONS WHERE CreatedDateTime IN (SELECT MAX(CreatedDateTime)
FROM Persons GROUP BY IDnum HAVING COUNT(IDnum) > 1 OR COUNT(IDNUM) = 1)
Upvotes: 2
Reputation: 33391
You can use CTE in conjunction with windowing function to get the latest records for each IDNum
WITH CTE
AS
(
SELECT
IDNum
,PersonPGUID
,CreatedDateTime
,FirstName
,MiddleName
,LastName
, ROW_NUMBER() OVER(PARTITION BY IDNum ORDER BY CreatedDateTime DESC) RN
FROM [Persons]
WHERE IDNum in (298928, 509520, 528203)
)
SELECT IDNum
,PersonPGUID
,CreatedDateTime
,FirstName
,MiddleName
,LastName
FROM CTE
WHERE RN = 1
Upvotes: 1
Reputation: 5679
Try this INNER JOIN. It would give you good performance, because an inline view is used.
SELECT [Persons].*
FROM [Persons]
INNER JOIN
(
SELECT
IDNum,
MAX(CreatedDateTime) AS CreatedDateTime
FROM [Persons]
WHERE IDNum in (298928, 509520, 528203)
GROUP BY IDNum
HAVING COUNT(*) > 1
) latest_records
ON [Persons].IDNum = latest_records.IDNum AND [Persons].CreatedDateTime = latest_records.CreatedDateTime;
Upvotes: 3
Reputation: 2739
Does this work for you? Basically selects the max time and joins on the result
select *
from persons p inner join (
select max(CreatedDatetime) as maxTime, IDNum as maxId from persons group by IDNum) maxTimeSet
on maxTimeSet.maxId = p.IDNum
Upvotes: 0
Reputation: 146603
This is generally done with a correlated subquery:
SELECT IDNum ,PersonPGUID ,CreatedDateTime ,FirstName ,MiddleName ,LastName
FROM [Persons] p
WHERE IDNum in (298928, 509520, 528203)
and CreatedDateTime = (Select max(CreatedDateTime)
Where IDNum = p.IDNum)
Upvotes: 0