Reputation: 5822
I need SQL code that will identify possible duplicates in a table. Lets say my table has 4 columns:
ID (primary key)
Date1
Date2
GroupID
(Date1, Date2, GroupID) form a unique key.
This table gets populated with blocks of data at a time, and it often happens that a new block is loaded in that contains a number of records that are already in there. This fine as long as the unique key catches them. Unfortunately, sometimes Date1 is empty (or at least '1900/01/01') either with the first or subsequent uploads.
So what I need is something to identify where the (Date2, GroupID) combination appear more than once and where for one of the records Date1 = '1900/01/01'
Thanks
Karl
Upvotes: 2
Views: 1234
Reputation: 33980
If I understand correctly, you are looking for a group of IDs for which GroupID and Date2 are the same, there's one occurance of Date1 that's different from 1900/01/01, and all the rest of the Date1s are 1900/01/01.
If I got it right, here's the query for you:
SELECT T.ID
FROM Table T1
WHERE
(T1.GroupID, T1.Date2) IN
(SELECT T2.GroupID, T2.Date2
WHERE T2.Date1 = '1900/01/01' OR
T2.Date IS NULL
GROUP BY T2.GroupID, T2.Date2)
AND
1 >=
(
SELECT COUNT(*)
FROM TABLE T3
WHERE NOT (T3.Date1 = '1900/01/01')
AND NOT (T3.Date1 IS NULL)
AND T3.GroupID = T1.GroupID
AND T3.Date2 = T1.Date2
)
Hope that helps.
Upvotes: 1
Reputation: 562260
This is the most straightforward way I can think to do it:
SELECT DISTINCT t1.*
FROM t t1 JOIN t t2 USING (date2, groupid)
WHERE t1.date1 = '1900/01/01';
No need to use GROUP BY
, which performs poorly on some brands of database.
Upvotes: 0
Reputation: 19666
bkm kind of has it, but the inner select can perform poorly on some databases. This is more straightforward:
select t1.* from
t as t1 left join t as t2
on (t1.date2=t2.date2 and t1.groupid=t2.groupid)
where t1.id != t2.id and (t1.date1='1900/01/01' or t2.date2='1900/01/01')
Upvotes: 2
Reputation: 8261
select * from table a
join (
select Date2, GroupID, Count(*)
from table
group by Date2, GroupID
having count(*) > 1
) b on (a.Date2 = b.Date2 and a.GroupID = b.GroupID)
where a.Date1 = '1900/01/01'
Upvotes: 0
Reputation: 966
A check constraint perhaps.
Something along the lines of select count(*) where date1 = '1900/01/01' and date2 = @date2 and groupid = @groupid
.
Just need to see if you can do this in a table-level constraint ....
Upvotes: 0
Reputation: 983
You can identify duplicates on (date2, GroupID) using
Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1
Use this to identify records in main table that are duplicates:
Select *
from t
where date1='1900/01/01'
and (date2,groupID) = (Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1)
NOTE: Since Date1, Date2, GroupID forms a unique key, check if your design is right in allowing Date1 to be NULL. You could have a genuine case where Date 1 is different for two rows while (date2,GroupID) is the same
Upvotes: 1
Reputation: 19603
In addition to having a PRIMARY KEY field defined on the table, you can also add other UNIQUE constraints to perform the same sort of thing you're asking for. They'll validate that a particular column or set of columns have a unique value in the table.
Check out the entry in the MySQL manual for an example:
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Upvotes: 0