Phoenix
Phoenix

Reputation: 1931

Find Duplicates By Created Date TSQL

I am attempting to find duplicates in a single table, where at least one of those duplicates was created in the last day.

Here is my query:

SELECT DateOfBirth DOB, 
    FirstName FirstName, 
    LastName LastName, 
    COUNT(*) TotalCount
FROM TABLE
WHERE DateOfBirth IS NOT NULL
    AND DATEDIFF(d,dateCreated,getDate()) <= 1
GROUP BY DateofBirth, FirstName, LastName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

The problem is that this query returns nothing, because both duplicates would need to be created within the last day (the way this reads).

I did some testing and found that this datediff requires that the dateCreated column both be within the datediff.

Any way to bring back these duplicates where the most recent duplicate was created within the last day? Even if the oldest duplicate was created a year ago?

Upvotes: 1

Views: 101

Answers (4)

outis nihil
outis nihil

Reputation: 736

This should work, without using a CTE:

SELECT 
     DOB =          t.DateOfBirth
    ,FirstName =    t.FirstName
    ,LastName =     t.LastName
    ,TotalCount =   COUNT(*)

FROM 
    TABLE t
WHERE   
    (NOT t.DateOfBirth IS NULL)
    AND EXISTS
       (
        SELECT * 
        FROM 
            TABLE nt 
        WHERE 
            DATEDIFF(d,nt.dateCreated,getDate()) <= 1 
            AND nt.FirstName = t.FirstName 
            AND nt.LastName = t.LastName
            AND nt.DateOfBirth = t.DateOfBirth
        )
GROUP BY 
    t.DateofBirth
    ,t.FirstName
    ,t.LastName
HAVING 
    COUNT(*) > 1
 ORDER BY 
    COUNT(*) DESC

Upvotes: 0

user2990854
user2990854

Reputation:

I have revised this as an alternative to Aarons answer, in case you wish to see only the duplicates which are not the original record.

   ;WITH x AS 
    (
      SELECT FirstName, LastName, DateOfBirth, DateCreated, 
         Row_number() OVER
        (
          PARTITION BY FirstName, LastName, DateOfBirth
        order by dateCreated) as rowNumber
      FROM dbo.[TABLE1]
    )
    SELECT FirstName, LastName, DateOfBirth, DateCreated, rowNumber
      FROM x 
      WHERE rowNumber > 1 
      AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP); 

Upvotes: 1

Joel Coehoorn
Joel Coehoorn

Reputation: 415820

If you only need to see the most recent, you can get away with a simple self-join:

SELECT t2.*
FROM table t1
INNER JOIN table t2 on t1.DateofBirth = t2.DateofBirth and t1.FirstName = t2.FirstName and t1.Lastname = t2.LastName
WHERE t2.DateCreated <> t1.DateCreated 
   AND t2.DateCreated > DATEADD(d, -1, current_timestamp)

If you need to see every instance of the record, you could try putting both the t1 and t2 values in the select list in the query above. If you want them as separate records, you can join to the above query as derived table:

SELECT t3.* 
FROM Table t3
INNER JOIN (
    SELECT t2.DateofBirth, t2.FirstName, t2.LastName
    FROM table t1
    INNER JOIN table t2 on t1.DateofBirth = t2.DateofBirth and t1.FirstName = t2.FirstName and t1.Lastname = t2.LastName
    WHERE t2.DateCreated <> t1.DateCreated 
       AND t2.DateCreated > DATEADD(d, -1, current_timestamp)
) sub on t3.DateofBirth = sub.DateofBirth and t3.FirstName = sub.FirstName and t3.Lastname = sub.LastName

Upvotes: 0

anon
anon

Reputation:

;WITH x AS 
(
  SELECT FirstName, LastName, DateOfBirth, DateCreated, 
    TotalCount = COUNT(*) OVER
    (
      PARTITION BY FirstName, LastName, DateOfBirth
    )
  FROM dbo.[TABLE]
)
SELECT FirstName, LastName, DateOfBirth, DateCreated, TotalCount
  FROM x 
  WHERE TotalCount > 1 
  AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP);

If you then want to eliminate those duplicates that were erroneously created in the last day, just change the outer query to:

;WITH x AS 
(
  ...
)
DELETE x WHERE TotalCount > 1 
  AND DateCreated >= DATEADD(DAY, -1, CURRENT_TIMESTAMP);

Upvotes: 3

Related Questions