MJay
MJay

Reputation: 91

SQL find records with duplicate email and date of birth

I'm trying to write a query to find any duplicate records in my database. I want to find all records (not the count) where the EmailAddress AND DateofBirth (both columns) already exist on another record.

Account tbl contains the EmailAddress.

User tbl contains the DateOfBirth

Join on AccountID

The following query selects records where the EmailAddress exists in another record OR the DateOfBirth exists in another record, but I'm unable to combine the two conditions. If I'm correct so far, the 'and' on line 7 acts more like an 'or' in my case..?

select a.AccountName, a.EmailAddress, u.DateOfBirth from Account as a
join [User] as u
on a.AccountID = u.AccountID
where a.EmailAddress in (
select EmailAddress from Account group by EmailAddress having count(*) > 1
)
and
       DateOfBirth in(
select DateOfBirth from [User] group by DateOfBirth having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress

For example, this may produce 50 records. If I look through them, I find 5 records all with the matching EmailAddress, however only 4 of them have the same DateOfBirth. The 5th record is displaying due to another record in the database with the same DateOfBirth but different EmailAddress.

I'd like to find only those records who have both the matching email and dob.

Thanks as always, please ask if you require a further description.

Regards Json

Upvotes: 0

Views: 1862

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270181

Using your approach, you can use exists:

select a.AccountName, a.EmailAddress, u.DateOfBirth
from Account as a join
     [User] as u
     on a.AccountID = u.AccountID
where exists (select EmailAddress
              from Account a2 join
                   [User] u2
                   on a.AccountID = u.AccountID
              where a2.EmailAddress = a.EmailAddress and
                    u2.DateOfBirth = u.DateOfBirth
              group by EmailAddress
              having count(*) > 1
             )
order by u.DateOfBirth, a.EmailAddress;

A better way is to use window/analytic functions:

select AccountName, EmailAddress, DateOfBirth
from (select a.AccountName, a.EmailAddress, u.DateOfBirth,
             count(*) over (partition by a.EmailAddress, u.DateOfBirth) as cnt
      from Account as a join
           [User] as u
           on a.AccountID = u.AccountID
     ) ua
where cnt > 1
order by DateOfBirth, EmailAddress;

Upvotes: 2

Micha Wiedenmann
Micha Wiedenmann

Reputation: 20843

  1. Join the two tables on the account id.
  2. Group by email and date
  3. Show only those entries which have count(*) > 1 (using the HAVING expression).

In MySQL (I have no MS SQL server available at the moment), this can be done with:

SELECT * FROM a JOIN b ON a.account = b.account
  GROUP BY email, birth
  HAVING count(*) > 1;

Where I used the following commands to setup the tables a and b:

create table a (
  account int primary key auto_increment,
  email text
);
create table b (
  account int,
  birth date,
  constraint foreign key (account) references a (account)
);

insert into a (email) values ("email1"), ("email1"), ("email2"), ("email2");
insert into b values (1, "2000-01-01"), (2, "2000-01-01"), (3, "2000-01-01"), (4, "2000-01-02");

Upvotes: 0

Related Questions