Huzaifa M Aamir
Huzaifa M Aamir

Reputation: 383

How to show all potential duplicate matches in a group when record is added on a particular date

I am having issues in coming up with the logic solution for this query

Suppose I have this data set

| id | firstname | lastname | productionunit |           lastmodifieddate |
|----|-----------|----------|----------------|----------------------------|
|  1 |      John |    Smith |          UnitA | December, 29 2016 00:00:00 |
|  2 |    Rachel |    Smith |          UnitA | December, 31 2016 00:00:00 |
|  3 |      Abby |   Turner |          UnitA |  January, 04 2017 00:00:00 |
|  4 |       Sam |    Telly |          UnitA | December, 29 2016 00:00:00 |
|  5 |       Mac |    Telly |          UnitB | December, 30 2016 00:00:00 |
|  6 |   Lincoln |    Telly |          UnitB |  January, 01 2017 00:00:00 |
|  7 |       Sam |    Telly |          UnitC | December, 30 2016 00:00:00 |
|  8 |      Test |    Smith |          UnitD | December, 30 2016 00:00:00 |

I need to find duplicates whenever LastName is the same in a ProductionUnit.

I only want those groups of duplicates if one of the record has been added on 2016-12-30

This is query I have written

SELECT
        a.id,
        a.firstname,
        a.lastname,
        a.productionunit,
        a.lastmodifieddate
    FROM forgerock a
    JOIN (SELECT
            productionunit ,lastname,
            COUNT(*) AS cnt
    FROM forgerock 
               WHERE lastname in ( select lastname from forgerock
                        where lastmodifieddate='2016-12-30')
    GROUP BY productionunit, lastname
     HAVING count(*)>1 ) b
     ON a.lastname = b.lastname and a.productionunit=b.productionunit

This query gives me this result:

| id | firstname | lastname | productionunit |           lastmodifieddate |
|----|-----------|----------|----------------|----------------------------|
|  1 |      John |    Smith |          UnitA | December, 29 2016 00:00:00 |
|  2 |    Rachel |    Smith |          UnitA | December, 31 2016 00:00:00 |
|  5 |       Mac |    Telly |          UnitB | December, 30 2016 00:00:00 |
|  6 |   Lincoln |    Telly |          UnitB |  January, 01 2017 00:00:00 |

This is desired result

| id | firstname | lastname | productionunit |           lastmodifieddate |
|----|-----------|----------|----------------|----------------------------|
|  5 |       Mac |    Telly |          UnitB | December, 30 2016 00:00:00 |
|  6 |   Lincoln |    Telly |          UnitB |  January, 01 2017 00:00:00 |

I believe the issue is with the subquery in my where clause. I searched for all last names that have the last modified date for 12-30, when actually I only want those last names that are within the productionunit that have modified date of 12-30

Hope this makes sense

Thanks

Here is a sqlfiddle link for schema : http://sqlfiddle.com/#!9/63598/1

Upvotes: 0

Views: 28

Answers (1)

You can change your subquery to be simpler, as you are just trying to look for any instance where there is a productionunit/lastname combination edited on that date. The following subquery can do just that

  SELECT DISTINCT
    needle.productionunit,
    needle.lastname
  FROM forgerock needle
  INNER JOIN forgerock haystack
    ON haystack.lastname = needle.lastname
    AND haystack.productionunit = needle.productionunit
    AND haystack.id != needle.id
  WHERE needle.lastmodifieddate = '2016-12-30'

Here it is in your SQL fiddle : http://sqlfiddle.com/#!9/63598/6

Upvotes: 1

Related Questions