Reputation: 23
I have a mailing list and there are many entries that are duplicates. I'm trying to find the duplicates so I can remove them. When I run the full query below I get all of the items in the table (142,000 +). When I run the subquery I get only 5768 rows. I need all the columns for each duplicate row to decide which one gets deleted. What am I doing wrong that causes the full query to return all the rows?
select * from Mailinglist
where exists
(select count(*), mailaddress, CenterName
from Mailinglist
group by MailAddress, CenterName
having count(*)>1)
Upvotes: 1
Views: 2775
Reputation: 312
There's no filtering (where) in your subquery so it will allways result something.
SELECT *
FROM Mailinglist AS ML
WHERE EXISTS
(SELECT COUNT(*) AS Expr1, mailaddress, CenterName
FROM Mailinglist AS CNT
WHERE (ML.MailAddress = CNT.MailAddress) AND (ML.CenterName = CNT.CenterName)
GROUP BY mailaddress, CenterName
HAVING (COUNT(*) > 1))
Upvotes: 0
Reputation: 31879
It's because EXISTS
returns true if the subquery returns a row. Your subquery,
is returning one or more rows, and thus returning TRUE
for the EXISTS
condition.
To get the MailingList
with duplicates, you can just run the subquery:
SELECT
COUNT(*),
mailaddress,
CenterName
FROM Mailinglist
GROUP BY
MailAddress, CenterName
HAVING COUNT(*) > 1
To delete the duplicates, you can use ROW_NUMBER
:
WITH Cte AS(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY MailAddress, Centername ORDER BY (SELECT NULL))
FROM MailingList
)
DELETE FROM Cte WHERE rn > 1
Just replace the ORDER BY
clause depending on which row of the duplicates you want to retain.
Upvotes: 0
Reputation: 2970
EXISTS
returns true if there is even a single record in the query following it.
What you are looking for is
select * from Mailinglist
where mailaddress IN
(
select mailaddress
from Mailinglist
group by MailAddress, CenterName
having count(*)>1
)
Upvotes: 3
Reputation: 72165
You have to do it this way:
select t1.*, t2.cnt
from Mailinglist t1
join (
select count(*) as cnt, mailaddress, CenterName
from Mailinglist
group by MailAddress, CenterName
having count(*)>1
) t2 ON t1.MailAddress = t2.MailAddress and t1.CenterName = t2.CenterName
Using EXISTS
just checks for the existence, or not, of records: if the subquery returns one or more records, then EXISTS
evaluates to true
.
Upvotes: 3