Reputation: 1931
I am attempting to find duplicates in a single table, where at least one of those duplicates was created in the last day.
Here is my query:
SELECT DateOfBirth DOB,
FirstName FirstName,
LastName LastName,
COUNT(*) TotalCount
FROM TABLE
WHERE DateOfBirth IS NOT NULL
AND DATEDIFF(d,dateCreated,getDate()) <= 1
GROUP BY DateofBirth, FirstName, LastName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
The problem is that this query returns nothing, because both duplicates would need to be created within the last day (the way this reads).
I did some testing and found that this datediff requires that the dateCreated
column both be within the datediff.
Any way to bring back these duplicates where the most recent duplicate was created within the last day? Even if the oldest duplicate was created a year ago?
Upvotes: 1
Views: 101
Reputation: 736
This should work, without using a CTE:
SELECT
DOB = t.DateOfBirth
,FirstName = t.FirstName
,LastName = t.LastName
,TotalCount = COUNT(*)
FROM
TABLE t
WHERE
(NOT t.DateOfBirth IS NULL)
AND EXISTS
(
SELECT *
FROM
TABLE nt
WHERE
DATEDIFF(d,nt.dateCreated,getDate()) <= 1
AND nt.FirstName = t.FirstName
AND nt.LastName = t.LastName
AND nt.DateOfBirth = t.DateOfBirth
)
GROUP BY
t.DateofBirth
,t.FirstName
,t.LastName
HAVING
COUNT(*) > 1
ORDER BY
COUNT(*) DESC
Upvotes: 0
Reputation:
I have revised this as an alternative to Aarons answer, in case you wish to see only the duplicates which are not the original record.
;WITH x AS
(
SELECT FirstName, LastName, DateOfBirth, DateCreated,
Row_number() OVER
(
PARTITION BY FirstName, LastName, DateOfBirth
order by dateCreated) as rowNumber
FROM dbo.[TABLE1]
)
SELECT FirstName, LastName, DateOfBirth, DateCreated, rowNumber
FROM x
WHERE rowNumber > 1
AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP);
Upvotes: 1
Reputation: 415820
If you only need to see the most recent, you can get away with a simple self-join:
SELECT t2.*
FROM table t1
INNER JOIN table t2 on t1.DateofBirth = t2.DateofBirth and t1.FirstName = t2.FirstName and t1.Lastname = t2.LastName
WHERE t2.DateCreated <> t1.DateCreated
AND t2.DateCreated > DATEADD(d, -1, current_timestamp)
If you need to see every instance of the record, you could try putting both the t1
and t2
values in the select list in the query above. If you want them as separate records, you can join to the above query as derived table:
SELECT t3.*
FROM Table t3
INNER JOIN (
SELECT t2.DateofBirth, t2.FirstName, t2.LastName
FROM table t1
INNER JOIN table t2 on t1.DateofBirth = t2.DateofBirth and t1.FirstName = t2.FirstName and t1.Lastname = t2.LastName
WHERE t2.DateCreated <> t1.DateCreated
AND t2.DateCreated > DATEADD(d, -1, current_timestamp)
) sub on t3.DateofBirth = sub.DateofBirth and t3.FirstName = sub.FirstName and t3.Lastname = sub.LastName
Upvotes: 0
Reputation:
;WITH x AS
(
SELECT FirstName, LastName, DateOfBirth, DateCreated,
TotalCount = COUNT(*) OVER
(
PARTITION BY FirstName, LastName, DateOfBirth
)
FROM dbo.[TABLE]
)
SELECT FirstName, LastName, DateOfBirth, DateCreated, TotalCount
FROM x
WHERE TotalCount > 1
AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP);
If you then want to eliminate those duplicates that were erroneously created in the last day, just change the outer query to:
;WITH x AS
(
...
)
DELETE x WHERE TotalCount > 1
AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP);
Upvotes: 3