Reputation: 91
I'm trying to write a query to find any duplicate records in my database. I want to find all records (not the count) where the EmailAddress AND DateofBirth (both columns) already exist on another record.
Account tbl contains the EmailAddress.
User tbl contains the DateOfBirth
Join on AccountID
The following query selects records where the EmailAddress exists in another record OR the DateOfBirth exists in another record, but I'm unable to combine the two conditions. If I'm correct so far, the 'and' on line 7 acts more like an 'or' in my case..?
select a.AccountName, a.EmailAddress, u.DateOfBirth from Account as a
join [User] as u
on a.AccountID = u.AccountID
where a.EmailAddress in (
select EmailAddress from Account group by EmailAddress having count(*) > 1
)
and
DateOfBirth in(
select DateOfBirth from [User] group by DateOfBirth having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress
For example, this may produce 50 records. If I look through them, I find 5 records all with the matching EmailAddress, however only 4 of them have the same DateOfBirth. The 5th record is displaying due to another record in the database with the same DateOfBirth but different EmailAddress.
I'd like to find only those records who have both the matching email and dob.
Thanks as always, please ask if you require a further description.
Regards Json
Upvotes: 0
Views: 1862
Reputation: 1270181
Using your approach, you can use exists
:
select a.AccountName, a.EmailAddress, u.DateOfBirth
from Account as a join
[User] as u
on a.AccountID = u.AccountID
where exists (select EmailAddress
from Account a2 join
[User] u2
on a.AccountID = u.AccountID
where a2.EmailAddress = a.EmailAddress and
u2.DateOfBirth = u.DateOfBirth
group by EmailAddress
having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress;
A better way is to use window/analytic functions:
select AccountName, EmailAddress, DateOfBirth
from (select a.AccountName, a.EmailAddress, u.DateOfBirth,
count(*) over (partition by a.EmailAddress, u.DateOfBirth) as cnt
from Account as a join
[User] as u
on a.AccountID = u.AccountID
) ua
where cnt > 1
order by DateOfBirth, EmailAddress;
Upvotes: 2
Reputation: 20843
count(*) > 1
(using the HAVING
expression).In MySQL (I have no MS SQL server available at the moment), this can be done with:
SELECT * FROM a JOIN b ON a.account = b.account
GROUP BY email, birth
HAVING count(*) > 1;
Where I used the following commands to setup the tables a
and b
:
create table a (
account int primary key auto_increment,
email text
);
create table b (
account int,
birth date,
constraint foreign key (account) references a (account)
);
insert into a (email) values ("email1"), ("email1"), ("email2"), ("email2");
insert into b values (1, "2000-01-01"), (2, "2000-01-01"), (3, "2000-01-01"), (4, "2000-01-02");
Upvotes: 0