Reputation: 36161
It's easy to find duplicates with one field:
SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
So if we have a table
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
This query will give us John, Sam, Tom, Tom because they all have the same email
.
However, what I want is to get duplicates with the same email
and name
.
That is, I want to get "Tom", "Tom".
The reason I need this: I made a mistake, and allowed inserting duplicate name
and email
values. Now I need to remove/change the duplicates, so I need to find them first.
Upvotes: 2515
Views: 3956696
Reputation: 22813
I always use CTE & ROW_NUMBER to delete the duplicate records like this:
With Temp AS (
SELECT ID, NAM, EMAIL,
ROW_NUMBER() OVER(PARTITION BY NAM, EMAIL Order By ID) AS RowNo
FROM Users
)
// To check the duplicate values, just use the delete after verifying
SELECT * FROM Temp WHERE RowNo > 1
Upvotes: 1
Reputation: 272316
You can use count(*) over (partition by ...)
to select all rows that are duplicate of each other. This approach gives you access to all rows and all columns (unlike group by
which consolidates duplicates rows and makes ungrouped columns inaccessible).
To select the original rows or delete their duplicates use row_number() over (partition by ... order by ...)
.
create table t (
id int not null primary key,
name varchar(100),
email varchar(100),
created date
);
insert into t (id, name, email, created) values
(1, 'Alice', '[email protected]', '2021-01-01'),
(2, 'Alice', '[email protected]', '2022-01-01'),
(3, 'Alice', '[email protected]', '2023-01-01'),
(4, 'Bob', '[email protected]', '2021-01-01'),
(5, 'Bob', '[email protected]', '2022-01-01'),
(6, 'John', '[email protected]', '2021-01-01'),
(7, 'Zack', '[email protected]', '2021-01-01');
with cte as (
select t.*, count(*) over (partition by name, email) as dup_count
from t
)
select *
from cte
where dup_count > 1;
{ Alice, [email protected] } is present three times, all three instances are selected
{ John, [email protected] } is present only once, it is excluded
| id | name | email | created | dup_count |
|----|-------|-------------------|------------|-----------|
| 1 | Alice | [email protected] | 2021-01-01 | 3 |
| 2 | Alice | [email protected] | 2022-01-01 | 3 |
| 3 | Alice | [email protected] | 2023-01-01 | 3 |
| 4 | Bob | [email protected] | 2021-01-01 | 2 |
| 5 | Bob | [email protected] | 2022-01-01 | 2 |
The CTE selects all but the oldest row in each set of duplicates
Some RDBMS support delete from CTEs
Or you may use delete from t where id in (...)
approach
with cte as (
select t.*, row_number() over (partition by name, email order by created) as rn
from t
)
delete
from cte
where rn > 1;
| id | name | email | created |
|----|-------|-------------------|------------|
| 1 | Alice | [email protected] | 2021-01-01 |
| 4 | Bob | [email protected] | 2021-01-01 |
| 6 | John | [email protected] | 2021-01-01 |
| 7 | Zack | [email protected] | 2021-01-01 |
DB<>Fiddle - SQL Server
DB<>Fiddle - MySQL
DB<>Fiddle - Oracle
Upvotes: 1
Reputation: 325
Either a duplicated value is repeated 2 times or greater than 2. Just count them, not groupwise.
select COUNT(distinct col_01) from Table_01
Upvotes: 10
Reputation: 421
In case you work with Oracle, this way would be preferable:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');
commit;
select *
from my_users
where rowid not in (
select min(rowid)
from my_users
group by name, email);
Upvotes: 19
Reputation: 37567
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
Upvotes: 162
Reputation: 419
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (2, 'Aman', '[email protected]');
insert into my_table values (3, 'Tom', '[email protected]');
insert into my_table values (4, 'Raj', '[email protected]');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
Upvotes: 12
Reputation: 131
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*)
from emp
group by ename, deptno
having count(*) > 1) t
on emp.ename=t.ename
order by emp.ename
Upvotes: 11
Reputation: 159
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
Upvotes: 13
Reputation: 457
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
Upvotes: 43
Reputation: 519
select * from Users a
where exists (select * from Users b
where (a.name = b.name
or a.email = b.email)
and a.ID != b.id)
If you search for duplicates who have some kind of prefix or general change like a new domain in mail then you can use replace() at these columns.
Upvotes: 7
Reputation: 5626
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
Upvotes: 10
Reputation: 868
To delete records whose names are duplicate
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable
)
DELETE FROM CTE WHERE T > 1
Upvotes: 3
Reputation: 852
SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1
Upvotes: 2
Reputation: 690
SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt > 1;
Upvotes: 3
Reputation: 1107
SELECT email, GROUP_CONCAT(id)
FROM users
GROUP BY email
HAVING COUNT(email) > 1;
Upvotes: 38
Reputation: 951
Pick the solution which best fits.
Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT NewTable VALUES (1,'John','[email protected]')
INSERT NewTable VALUES (2,'Sam','[email protected]')
INSERT NewTable VALUES (3,'Tom','[email protected]')
INSERT NewTable VALUES (4,'Bob','[email protected]')
INSERT NewTable VALUES (5,'Tom','[email protected]')
1. USING GROUP BY CLAUSE
SELECT
name, email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name, email
HAVING COUNT(*) > 1
2. Using a CTE:
To return the entire row for each duplicate row, join the result of the above query with the NewTable
table using a common table expression (CTE):
WITH cte AS (
SELECT
name, email, COUNT(*) occurrences
FROM NewTable
GROUP BY name, email
HAVING COUNT(*) > 1
)
SELECT
t1.Id, t1.name, t1.email
FROM NewTable t1
INNER JOIN cte ON
cte.name = t1.name AND
cte.email = t1.email
ORDER BY
t1.name,
t1.email;
3. Using function ROW_NUMBER()
WITH cte AS (
SELECT
name, email,
ROW_NUMBER() OVER (
PARTITION BY name,email
ORDER BY name,email) rownum
FROM NewTable t1
)
SELECT
*
FROM cte
WHERE rownum > 1;
ROW_NUMBER()
distributes rows of the NewTable
table into partitions by values in the name
and email
columns. The duplicate rows will have repeated values in the name
and email
columns, but different row numbersUpvotes: 20
Reputation: 2196
WITH CTE AS
( SELECT Id, Name, Age,
Comments, RN = ROW_NUMBER() OVER (PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
Upvotes: 23
Reputation: 24
select *
FROM TABLENAME
WHERE PrimaryCoumnID NOT IN
(
SELECT MAX(PrimaryCoumnID)
FROM TABLENAME
GROUP BY AnyCoumnID
);
Upvotes: 0
Reputation: 132
DECLARE @myTable TABLE
(
id INT,
name VARCHAR(10),
email VARCHAR(50)
);
INSERT @myTable
VALUES
(1, 'John', 'John-email');
INSERT @myTable
VALUES
(2, 'John', 'John-email');
INSERT @myTable
VALUES
(3, 'fred', 'John-email');
INSERT @myTable
VALUES
(4, 'fred', 'fred-email');
INSERT @myTable
VALUES
(5, 'sam', 'sam-email');
INSERT @myTable
VALUES
(6, 'sam', 'sam-email');
WITH cte
AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rowNum,
*
FROM @myTable)
SELECT c1.id,
c1.name,
c1.email
FROM cte AS c1
WHERE 1 <
(
SELECT COUNT(c2.rowNum)
FROM cte AS c2
WHERE c1.name = c2.name
AND c1.email = c2.email
);
Upvotes: 0
Reputation: 103667
Try this:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
If you want the IDs of the dups use this:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
To delete the duplicates try:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
Upvotes: 447
Reputation: 329
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
Select duplicates:
SELECT *
FROM <table>
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY <column1>, <column2>
);
Delete duplicates:
DELETE FROM <table>
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY <column1>, <column2>
);
Be aware of larger amounts of records, it can cause performance problems.
Upvotes: 29
Reputation: 649
In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY
part of row_number function choose the desired unique/duplicit columns.
SELECT *
FROM (
SELECT a.*
, Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
FROM Customers AS a
) AS b
WHERE r > 1;
When you want to select ALL duplicated records with ALL fields you can write it like
CREATE TABLE test (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, c1 integer
, c2 text
, d date DEFAULT now()
, v text
);
INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;
SELECT *
FROM test
WHERE (c1, c2) IN (
SELECT c1, c2
FROM test
GROUP BY 1,2
HAVING count(*) > 1
)
ORDER BY 1;
Tested in PostgreSQL.
Upvotes: 64
Reputation: 984
In case you work with Microsoft Access, this way works:
CREATE TABLE users (id int, name varchar(10), email varchar(50));
INSERT INTO users VALUES (1, 'John', '[email protected]');
INSERT INTO users VALUES (2, 'Sam', '[email protected]');
INSERT INTO users VALUES (3, 'Tom', '[email protected]');
INSERT INTO users VALUES (4, 'Bob', '[email protected]');
INSERT INTO users VALUES (5, 'Tom', '[email protected]');
SELECT name, email, COUNT(*) AS CountOf
FROM users
GROUP BY name, email
HAVING COUNT(*)>1;
DELETE *
FROM users
WHERE id IN (
SELECT u1.id
FROM users u1, users u2
WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id
);
Thanks to Tancrede Chazallet for the delete code.
Upvotes: 0
Reputation: 4040
Table structure:
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
Solution 1:
SELECT *,
COUNT(*)
FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id
AND t1.name = t2.name
AND t1.email=t2.email
Solution 2:
SELECT name,
email,
COUNT(*)
FROM users
GROUP BY name,
email
HAVING COUNT(*) > 1
Upvotes: 1
Reputation: 378
The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
Upvotes: 6
Reputation: 283
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
Upvotes: 6
Reputation: 871
The exact code would differ depending on whether you want to find duplicate rows as well or only different ids with the same email and name. If id is a primary key or otherwise has a unique constraint this distinction does not exist, but the question does not specify this. In the former case you can use code given in several other answers:
SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1
In the latter case you would use:
SELECT name, email, COUNT(DISTINCT id)
FROM users
GROUP BY name, email
HAVING COUNT(DISTINCT id) > 1
ORDER BY COUNT(DISTINCT id) DESC
Upvotes: 0
Reputation: 646
You can use the SELECT DISTINCT keyword to get rid of duplicates. You can also filter by name and get everyone with that name on a table.
Upvotes: -1
Reputation: 421
To Check From duplicate Record in a table.
select * from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
select * from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
To Delete the duplicate record in a table.
delete from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
delete from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
Upvotes: 3
Reputation: 5653
We can use having here which work on aggregate functions as shown below
create table #TableB (id_account int, data int, [date] date)
insert into #TableB values (1 ,-50, '10/20/2018'),
(1, 20, '10/09/2018'),
(2 ,-900, '10/01/2018'),
(1 ,20, '09/25/2018'),
(1 ,-100, '08/01/2018')
SELECT id_account , data, COUNT(*)
FROM #TableB
GROUP BY id_account , data
HAVING COUNT(id_account) > 1
drop table #TableB
Here as two fields id_account and data are used with Count(*). So, it will give all the records which has more than one times same values in both columns.
We some reason mistakely we had missed to add any constraints in SQL server table and the records has been inserted duplicate in all columns with front-end application. Then we can use below query to delete duplicate query from table.
SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
TRUNCATE TABLE #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
DROP TABLE #TemNewTable
Here we have taken all the distinct records of the orignal table and deleted the records of original table. Again we inserted all the distinct values from new table to the original table and then deleted new table.
Upvotes: 1