Reputation: 8331
I have a SQL DB table that has some data duplication. I need to find records based on the fact that none of the "duplicate" records has a value of Null value in one of the fields. i.e.
ID Name StartDate
1 Fred 1/1/1945
2 Jack 2/2/1985
3 Mary 3/3/1999
4 Fred null
5 Jack 5/5/1977
6 Jack 4/4/1985
7 Fred 10/10/2001
In the example above I need to find Jack and Mary but not Fred. I assume some sort of Self Join or Union but have run into a mental block on what exactly would give me my desired results.
Upvotes: 1
Views: 52
Reputation: 3764
There are a couple of components to this:
this is the easy one. Depending on your version of SQL, you basically do:
select count(*) as numbers, Name from this_table having numbers >= 2 group by Name
This one increases complexity, but ultimately isn't bad.
select count(*) as numbers, Name from this_table where StartDate is not null having numbers >= 2 group by Name
Dealing with duplicates that are also null This is a case not mentioned in your original question. Say I have someone with three entires, two are dates but one is null. Do we want these records, or not? If we do, then the query in part 2 works. If we do not, then we want a sub-select, like:
select count(*) as numbers, Name from this_table where Name not in (select Name from this_table where StartDate is null) having numbers >= 2 group by Name
That said, these queries are not very useful unless you have something more uniquely identifying than a name. First names especially are overly common.
Upvotes: 0
Reputation: 3418
Another option: first we select all rectords where no field is null, we group them and count them. Then we select just those that have a total > 1
select * from (
select name, startdate, count(*) as total from so group by name, startdate having name is not null and startdate is not null
) as data where total > 1
Upvotes: 1
Reputation: 1748
Ok, went back and re-read the question. It sounds like you need a sub-select instead of a join, although a join would work too.
WHERE Name NOT IN ( SELECT DISTINCT Name FROM table WHERE StartDate IS NULL )
should give the desired results, eliminating ALL Fred records based on the fact that Fred qualified with a single NULL date.
Upvotes: 1
Reputation: 152556
First create the query to find duplicates, then add a condition that it not have a record with a NULL StartDate
SELECT Name
FROM myTable
GROUP BY Name
HAVING COUNT(*) > 1
WHERE Name NOT IN (SELECT Name FROM myTable WHERE StartDate IS NULL)
Upvotes: 1