Alex
Alex

Reputation: 36161

Finding duplicate values in a SQL table

It's easy to find duplicates with one field:

SELECT email, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

Upvotes: 2515

Views: 3956696

Answers (30)

Ali Adravi
Ali Adravi

Reputation: 22813

I always use CTE & ROW_NUMBER to delete the duplicate records like this:

With Temp AS (
  SELECT ID, NAM, EMAIL,
    ROW_NUMBER() OVER(PARTITION BY NAM, EMAIL Order By ID) AS RowNo
  FROM Users
) 

// To check the duplicate values, just use the delete after verifying
SELECT * FROM Temp WHERE RowNo > 1

Upvotes: 1

Salman Arshad
Salman Arshad

Reputation: 272316

You can use count(*) over (partition by ...) to select all rows that are duplicate of each other. This approach gives you access to all rows and all columns (unlike group by which consolidates duplicates rows and makes ungrouped columns inaccessible).

To select the original rows or delete their duplicates use row_number() over (partition by ... order by ...).

Sample data

create table t (
  id int not null primary key,
  name varchar(100),
  email varchar(100),
  created date
);

insert into t (id, name, email, created) values
(1, 'Alice', '[email protected]', '2021-01-01'),
(2, 'Alice', '[email protected]', '2022-01-01'),
(3, 'Alice', '[email protected]', '2023-01-01'),
(4, 'Bob',   '[email protected]',   '2021-01-01'),
(5, 'Bob',   '[email protected]',   '2022-01-01'),
(6, 'John',  '[email protected]',  '2021-01-01'),
(7, 'Zack',  '[email protected]',  '2021-01-01');

Select all rows that are duplicate of each other

with cte as (
  select t.*, count(*) over (partition by name, email) as dup_count
  from t
)
select *
from cte
where dup_count > 1;

Result

{ Alice, [email protected] } is present three times, all three instances are selected
{ John, [email protected] } is present only once, it is excluded

| id | name  | email             | created    | dup_count |
|----|-------|-------------------|------------|-----------|
| 1  | Alice | [email protected] | 2021-01-01 | 3         |
| 2  | Alice | [email protected] | 2022-01-01 | 3         |
| 3  | Alice | [email protected] | 2023-01-01 | 3         |
| 4  | Bob   | [email protected]   | 2021-01-01 | 2         |
| 5  | Bob   | [email protected]   | 2022-01-01 | 2         |

Select (or delete) the duplicates

The CTE selects all but the oldest row in each set of duplicates
Some RDBMS support delete from CTEs
Or you may use delete from t where id in (...) approach

with cte as (
  select t.*, row_number() over (partition by name, email order by created) as rn
  from t
)
delete
from cte
where rn > 1;

Result after deletion

| id | name  | email             | created    |
|----|-------|-------------------|------------|
| 1  | Alice | [email protected] | 2021-01-01 |
| 4  | Bob   | [email protected]   | 2021-01-01 |
| 6  | John  | [email protected]  | 2021-01-01 |
| 7  | Zack  | [email protected]  | 2021-01-01 |

DB<>Fiddle - SQL Server
DB<>Fiddle - MySQL
DB<>Fiddle - Oracle

Upvotes: 1

Muhammad Tahir
Muhammad Tahir

Reputation: 325

Either a duplicated value is repeated 2 times or greater than 2. Just count them, not groupwise.

select COUNT(distinct col_01) from Table_01

Upvotes: 10

xDBA
xDBA

Reputation: 421

In case you work with Oracle, this way would be preferable:

create table my_users(id number, name varchar2(100), email varchar2(100));

insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');

commit;

select *
from my_users
where rowid not in (
    select min(rowid)
    from my_users
    group by name, email);

Upvotes: 19

Chris Van Opstal
Chris Van Opstal

Reputation: 37567

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )

Upvotes: 162

shekhar Kumar
shekhar Kumar

Reputation: 419

create table my_table(id int, name varchar(100), email varchar(100));

insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (2, 'Aman', '[email protected]');
insert into my_table values (3, 'Tom', '[email protected]');
insert into my_table values (4, 'Raj', '[email protected]');

Select COUNT(1) As Total_Rows from my_table 
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc 

Upvotes: 12

naveed
naveed

Reputation: 131

 select emp.ename, emp.empno, dept.loc 
 from emp
 inner join dept 
 on dept.deptno=emp.deptno
 inner join
    (select ename, count(*)
    from emp
    group by ename, deptno
    having count(*) > 1) t
 on emp.ename=t.ename
 order by emp.ename

Upvotes: 11

user4877838
user4877838

Reputation: 159

SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

Upvotes: 13

PRADEEPTA VIRLLEY
PRADEEPTA VIRLLEY

Reputation: 457

SELECT name, email 
FROM users
WHERE email in
    (SELECT email FROM users
    GROUP BY email 
    HAVING COUNT(*)>1)

Upvotes: 43

veritaS
veritaS

Reputation: 519

 select * from Users a
 where exists (select * from Users b 
     where (a.name = b.name 
         or a.email = b.email)
         and a.ID != b.id)

If you search for duplicates who have some kind of prefix or general change like a new domain in mail then you can use replace() at these columns.

Upvotes: 7

Debendra Dash
Debendra Dash

Reputation: 5626

with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1

Upvotes: 10

Sheriff
Sheriff

Reputation: 868

To delete records whose names are duplicate

;WITH CTE AS    
(
    SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM     @YourTable    
)
DELETE FROM CTE WHERE T > 1

Upvotes: 3

adesh
adesh

Reputation: 852

SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1

Upvotes: 2

The AG
The AG

Reputation: 690

SELECT * from 
    (SELECT name, email,
        COUNT(name) OVER (PARTITION BY name, email) cnt 
    FROM users)
WHERE cnt > 1;

Upvotes: 3

Indivision Dev
Indivision Dev

Reputation: 1107

SELECT email, GROUP_CONCAT(id)
FROM   users
GROUP  BY email
HAVING COUNT(email) > 1;

Upvotes: 38

ishant kaushik
ishant kaushik

Reputation: 951

Pick the solution which best fits.

Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT  NewTable VALUES (1,'John','[email protected]')
INSERT  NewTable VALUES (2,'Sam','[email protected]')
INSERT  NewTable VALUES (3,'Tom','[email protected]')
INSERT  NewTable VALUES (4,'Bob','[email protected]')
INSERT  NewTable VALUES (5,'Tom','[email protected]')

enter image description here

1. USING GROUP BY CLAUSE

SELECT
    name, email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name, email
HAVING COUNT(*) > 1

enter image description here

  • The GROUP BY clause groups the rows into groups by values in both name and email columns.
  • Then, the COUNT() function returns the number of occurrences of each group (name,email).
  • Then, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.

2. Using a CTE:

To return the entire row for each duplicate row, join the result of the above query with the NewTable table using a common table expression (CTE):

WITH cte AS (
    SELECT
        name, email, COUNT(*) occurrences
    FROM NewTable
    GROUP BY name, email
    HAVING COUNT(*) > 1
)
SELECT 
    t1.Id, t1.name, t1.email
FROM  NewTable t1
INNER JOIN cte ON 
    cte.name = t1.name AND 
    cte.email = t1.email
ORDER BY 
    t1.name, 
    t1.email;

enter image description here

3. Using function ROW_NUMBER()

WITH cte AS (
    SELECT 
        name, email, 
        ROW_NUMBER() OVER (
            PARTITION BY name,email
            ORDER BY name,email) rownum
    FROM NewTable t1
) 
SELECT 
  * 
FROM cte 
WHERE rownum > 1;

enter image description here

  • ROW_NUMBER() distributes rows of the NewTable table into partitions by values in the name and email columns. The duplicate rows will have repeated values in the name and email columns, but different row numbers
  • Outer query removes the first row in each group.

Upvotes: 20

Tanmay Nehete
Tanmay Nehete

Reputation: 2196

WITH CTE AS
    ( SELECT Id, Name, Age,
        Comments, RN = ROW_NUMBER() OVER (PARTITION BY Name,Age ORDER BY ccn)
    FROM ccnmaster )
select * from CTE 

Upvotes: 23

imomins
imomins

Reputation: 24

select *
FROM TABLENAME
    WHERE PrimaryCoumnID NOT IN
    (
        SELECT MAX(PrimaryCoumnID)
        FROM  TABLENAME
        GROUP BY AnyCoumnID
    );

Upvotes: 0

ashkanyo
ashkanyo

Reputation: 132

DECLARE @myTable TABLE
(
    id INT,
    name VARCHAR(10),
    email VARCHAR(50)
);

INSERT @myTable
VALUES
(1, 'John', 'John-email');
INSERT @myTable
VALUES
(2, 'John', 'John-email');
INSERT @myTable
VALUES
(3, 'fred', 'John-email');
INSERT @myTable
VALUES
(4, 'fred', 'fred-email');
INSERT @myTable
VALUES
(5, 'sam', 'sam-email');
INSERT @myTable
VALUES
(6, 'sam', 'sam-email');

WITH cte
AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rowNum,
           *
    FROM @myTable)
SELECT c1.id,
       c1.name,
       c1.email
FROM cte AS c1
WHERE 1 <
(
    SELECT COUNT(c2.rowNum)
    FROM cte AS c2
    WHERE c1.name = c2.name
          AND c1.email = c2.email
);

Upvotes: 0

KM.
KM.

Reputation: 103667

Try this:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

If you want the IDs of the dups use this:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

OUTPUT:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

To delete the duplicates try:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

OUTPUT:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

Upvotes: 447

Martin Silovsk&#253;
Martin Silovsk&#253;

Reputation: 329

This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.

Select duplicates:

SELECT *
FROM <table>
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY <column1>, <column2>
);

Delete duplicates:

DELETE FROM <table>
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY <column1>, <column2>
);

Be aware of larger amounts of records, it can cause performance problems.

Upvotes: 29

gaurav singh
gaurav singh

Reputation: 649

In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY part of row_number function choose the desired unique/duplicit columns.

SELECT  *
FROM    (
 SELECT a.*
 ,      Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
 FROM   Customers AS a
)       AS b
WHERE   r > 1;

When you want to select ALL duplicated records with ALL fields you can write it like

CREATE TABLE test (
        id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
,       c1      integer
,       c2      text
,       d       date DEFAULT now()
,       v       text
);

INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;

SELECT  *
FROM    test
WHERE   (c1, c2) IN (
 SELECT c1, c2
 FROM   test
 GROUP  BY 1,2
 HAVING count(*) > 1
)
ORDER   BY 1;

Tested in PostgreSQL.

Upvotes: 64

MT1
MT1

Reputation: 984

In case you work with Microsoft Access, this way works:

CREATE TABLE users (id int, name varchar(10), email varchar(50));

INSERT INTO users VALUES (1, 'John', '[email protected]');
INSERT INTO users VALUES (2, 'Sam', '[email protected]');
INSERT INTO users VALUES (3, 'Tom', '[email protected]');
INSERT INTO users VALUES (4, 'Bob', '[email protected]');
INSERT INTO users VALUES (5, 'Tom', '[email protected]');

SELECT name, email, COUNT(*) AS CountOf
FROM users
GROUP BY name, email
HAVING COUNT(*)>1;

DELETE *
FROM users
WHERE id IN (
    SELECT u1.id 
    FROM users u1, users u2 
    WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id
);

Thanks to Tancrede Chazallet for the delete code.

Upvotes: 0

Ankit Jindal
Ankit Jindal

Reputation: 4040

Table structure:

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

Solution 1:

SELECT *,
       COUNT(*)
FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id
  AND t1.name = t2.name
  AND t1.email=t2.email

Solution 2:

SELECT name,
         email,
       COUNT(*)
FROM users
GROUP BY name,
         email
HAVING COUNT(*) > 1

Upvotes: 1

RyanAbnavi
RyanAbnavi

Reputation: 378

The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.

SELECT u.*
FROM users AS u
JOIN (SELECT username, email
      FROM users
      GROUP BY username, email
      HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;

Upvotes: 6

Mohammad Neamul Islam
Mohammad Neamul Islam

Reputation: 283

SELECT name, email,COUNT(email) 
FROM users 
WHERE email IN (
    SELECT email 
    FROM users 
    GROUP BY email 
    HAVING COUNT(email) > 1)

Upvotes: 6

RET
RET

Reputation: 871

The exact code would differ depending on whether you want to find duplicate rows as well or only different ids with the same email and name. If id is a primary key or otherwise has a unique constraint this distinction does not exist, but the question does not specify this. In the former case you can use code given in several other answers:

SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1

In the latter case you would use:

SELECT name, email, COUNT(DISTINCT id)
FROM users
GROUP BY name, email
HAVING COUNT(DISTINCT id) > 1
ORDER BY COUNT(DISTINCT id) DESC

Upvotes: 0

Parkofadown
Parkofadown

Reputation: 646

You can use the SELECT DISTINCT keyword to get rid of duplicates. You can also filter by name and get everyone with that name on a table.

Upvotes: -1

Arun Solomon
Arun Solomon

Reputation: 421

To Check From duplicate Record in a table.

select * from users s 
where rowid < any 
(select rowid from users k where s.name = k.name and s.email = k.email);

or

select * from users s 
where rowid not in 
(select max(rowid) from users k where s.name = k.name and s.email = k.email);

To Delete the duplicate record in a table.

delete from users s 
where rowid < any 
(select rowid from users k where s.name = k.name and s.email = k.email);

or

delete from users s 
where rowid not in 
(select max(rowid) from users k where s.name = k.name and s.email = k.email);

Upvotes: 3

Suraj Kumar
Suraj Kumar

Reputation: 5653

We can use having here which work on aggregate functions as shown below

create table #TableB (id_account int, data int, [date] date)
insert into #TableB values (1 ,-50, '10/20/2018'),
(1, 20, '10/09/2018'),
(2 ,-900, '10/01/2018'),
(1 ,20, '09/25/2018'),
(1 ,-100, '08/01/2018')  

SELECT id_account , data, COUNT(*)
FROM #TableB
GROUP BY id_account , data
HAVING COUNT(id_account) > 1

drop table #TableB

Here as two fields id_account and data are used with Count(*). So, it will give all the records which has more than one times same values in both columns.

We some reason mistakely we had missed to add any constraints in SQL server table and the records has been inserted duplicate in all columns with front-end application. Then we can use below query to delete duplicate query from table.

SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
TRUNCATE TABLE #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
DROP TABLE #TemNewTable

Here we have taken all the distinct records of the orignal table and deleted the records of original table. Again we inserted all the distinct values from new table to the original table and then deleted new table.

Upvotes: 1

Related Questions