Reputation: 5453

Find duplicate records based on two columns

I have a table named employee, there are many records in this table. Here is some sample data :

fullname  | address  |  city
-----------------------------
AA          address1    City1
AA          address3    City1
AA          address8    City2
BB          address5    City2
BB          address2    City1
CC          address6    City1
CC          address7    City2
DD          address4    City1

I want to have a SELECT query in sql server which will show only the duplicate records based on the columns fullname and city. For the given data and considering the condition, only the first two records is duplicate. So my expected output should be like below :

fullname  | address  |  city
-----------------------------
AA          address1    City1
AA          address3    City1

To get this output, I have this query :

select fullname, city from employee group by fullname, city having count(*)>1

As you can see, it selects two columns only and thus it is giving the following output :

fullname  | city
------------------
AA          City1

If I re-write the query like below :

select fullname, city, address from employee group by fullname, city, address 
having count(*)>1

Unfortunately it is showing no records! Can anybody please help me to write the correct query to get the expected result?

Upvotes: 13

Answers (6)

Arunav dutta gupta

Reputation: 109

 SELECT Feild1, Feild2, COUNT() FROM table name GROUP BY Feild1, Feild2 HAVING COUNT()>1

This will give you all yours answer

Upvotes: -1

Jacob Goldhirsch

Reputation: 136

Agree with above answer. But If you don't want to use Windows functions which might not work properly on all DBs you can join to itself on city and full name after the group by and having and then get the addresses

   Select employee.* from employee 
   join (select fullname, city from employee group by fullname, city having count(*)>1) q1 
   on q1.fullname = employee.fullname and q1.city = employee.city

Upvotes: 2

Khorshed Alam

Reputation: 334

Here you go with the solution:

DECLARE  @Employee TABLE
        (
            Fullname VARCHAR(25),
            [Address] VARCHAR(25),
            City VARCHAR(25)
        )

      INSERT INTO @Employee VALUES
      ('AA', 'address1', 'City1') 
      ,('AA', 'address1', 'City1') 
      ,('AA', 'address3', 'City1')
      ,('AA', 'address8', 'City2')
      ,('BB', 'address5', 'City2')
      ,('BB', 'address2', 'City1')
      ,('CC', 'address6', 'City1')
      ,('CC', 'address7', 'City2')

     ;WITH cte AS (
               SELECT *,
                      ROW_NUMBER() OVER(PARTITION BY FullName, [Address], [City] ORDER BY Fullname) AS sl,
                      HashBytes('MD5', FullName + [Address] + [City]) AS RecordId
               FROM   @Employee AS e
           )

      SELECT c.FullName,
             c.[Address],
             c.City
      FROM   cte             AS c
             INNER JOIN cte  AS c1
                  ON  c.RecordId = c1.RecordId
      WHERE  c.sl = 2

Result :

FullName    Address     City
AA          address1    City1
AA          address1    City1

Upvotes: 0

Gordon Linoff

Reputation: 1269463

If you have a unique id or the address is always different, you can try:

select e.*
from employee e
where exists (select 1
              from employee e2
              where e2.fullname = e.fullname and e2.city = e.city and
                    e2.address <> e.address  -- or id or some other unique column
             );

Although I would probably go with the window function approach, you might find that under some circumstances, this is faster (especially if you have an index on employee(fullname, city, address)).

Upvotes: 1

Prathamesh shinde

Reputation: 21

Try the Following Code:

      create table ##Employee
      (Fullname varchar(25),
       Address varchar(25),
       City varchar(25))

      insert into ##Employee values
     (    'AA',          'address1',    'City1')
    ,(    'AA',          'address3',    'City1')
    ,(    'AA',          'address8',    'City2')
    ,(    'BB',          'address5',    'City2')
    ,(    'BB',          'address2',    'City1')
    ,(    'CC',          'address6',    'City1')
    ,(    'CC',          'address7',    'City2')


      select E.* from ##Employee E
      cross apply(
      select Fullname,City,count(Fullname) cnt from ##Employee
      group by Fullname,City
      having Count(Fullname)>1)x
      where E.Fullname=x.Fullname
      and E.City=x.City

Upvotes: 1

Martin Smith

Reputation: 452977

Instead of a grouped COUNT you can use it as a windowed aggregate to access the other columns

SELECT fullname,
       address,
       city
FROM   (SELECT *,
               COUNT(*) OVER (PARTITION BY fullname, city) AS cnt
        FROM   employee) e
WHERE  cnt > 1

Upvotes: 19

Find duplicate records based on two columns

Answers (6)

Related Questions