John S
John S

Reputation: 8331

Finding specific duplicates when one field is different

I have a SQL DB table that has some data duplication. I need to find records based on the fact that none of the "duplicate" records has a value of Null value in one of the fields. i.e.

ID    Name   StartDate
1     Fred   1/1/1945
2     Jack   2/2/1985
3     Mary   3/3/1999
4     Fred   null 
5     Jack   5/5/1977
6     Jack   4/4/1985
7     Fred   10/10/2001

In the example above I need to find Jack and Mary but not Fred. I assume some sort of Self Join or Union but have run into a mental block on what exactly would give me my desired results.

Upvotes: 1

Views: 52

Answers (4)

Marshall Tigerus
Marshall Tigerus

Reputation: 3764

There are a couple of components to this:

  1. Selecting Duplicates

this is the easy one. Depending on your version of SQL, you basically do:

 select count(*) as numbers, Name from this_table having numbers >= 2 group by Name
  1. Removing those with Null

This one increases complexity, but ultimately isn't bad.

 select count(*) as numbers, Name from this_table  where StartDate is not null having numbers >= 2 group by Name
  1. Dealing with duplicates that are also null This is a case not mentioned in your original question. Say I have someone with three entires, two are dates but one is null. Do we want these records, or not? If we do, then the query in part 2 works. If we do not, then we want a sub-select, like:

    select count(*) as numbers, Name from this_table where Name not in (select Name from this_table where StartDate is null) having numbers >= 2 group by Name

That said, these queries are not very useful unless you have something more uniquely identifying than a name. First names especially are overly common.

Upvotes: 0

David
David

Reputation: 3418

Another option: first we select all rectords where no field is null, we group them and count them. Then we select just those that have a total > 1

select * from (
    select name, startdate, count(*) as total from so group by name, startdate having name is not null and startdate is not null
) as data where total > 1

Upvotes: 1

Avery Payne
Avery Payne

Reputation: 1748

Ok, went back and re-read the question. It sounds like you need a sub-select instead of a join, although a join would work too.

WHERE Name NOT IN ( SELECT DISTINCT Name FROM table WHERE StartDate IS NULL )

should give the desired results, eliminating ALL Fred records based on the fact that Fred qualified with a single NULL date.

Upvotes: 1

D Stanley
D Stanley

Reputation: 152556

First create the query to find duplicates, then add a condition that it not have a record with a NULL StartDate

SELECT Name
FROM myTable
GROUP BY Name
HAVING COUNT(*) > 1
WHERE Name NOT IN (SELECT Name FROM myTable WHERE StartDate IS NULL)

Upvotes: 1

Related Questions