Reputation: 605
I have a table with about 11 fields (columns), and I know for a fact that there are multiple repeating entries, but they are repeating only within few fields. What I want to do is that if two or more rows have identical ProviderName
and Address
in lowercase, and identical StateID
, then I want to delete those extra rows, and leave just one copy of it. I have something like this, but I have no idea how to complete it.
SET SQL_SAFE_UPDATEs=0;
DELETE FROM providers
WHERE LCASE(ProviderName), LCASE(Address), StateID;
UPDATE:
SET SQL_SAFE_UPDATES=0;
DELETE p1.*
FROM providers AS p1
JOIN providers AS p2
ON LCASE(p1.ProviderName) = LCASE(p2.ProviderName)
AND LCASE(p1.Address) = LCASE(p2.Address)
AND p1.Zip = p2.Zip
AND p1.StateID = p2.StateID
AND p1.ProviderId > p2.ProviderId
Upvotes: 0
Views: 71
Reputation: 780688
This answer assumes you have a unique id
field that distinguishes the rows that have duplicate values. It will keep the row with the lowest id
.
DELETE p1.*
FROM providers AS p1
JOIN providers AS p2
ON LCASE(p1.ProviderName) = LCASE(p2.ProviderName)
AND LCASE(p1.Address) = LCASE(p2.Address)
AND p1.StateID = p2.StateID
AND p1.id > p2.id
Note that if your table uses case-insensitive collation, you don't need to call LCASE
.
A more efficient query might be:
DELETE p1.*
FROM providers AS p1
JOIN (SELECT LCASE(ProviderName) AS Name, LCASE(Address) AS Addr, StateID, MIN(id) AS id
FROM providers
GROUP BY Name, Addr, StateID
HAVING COUNT(*) > 1) AS p2
ON LCASE(p1.ProviderName) = p2.Name
AND LCASE(p1.Address) = p2.Addr
AND p1.StateID = p2.StateID
AND p1.id > p2.id
The cross-product is much smaller because we first group all the providers with identical values. Also, the HAVING
clause filters out all the non-duplicates.
Upvotes: 1
Reputation: 408
To leave entry with highest id:
SET SQL_SAFE_UPDATES = 0;
DELETE n1 FROM provider n1, provider n2 WHERE n1.id < n2.id AND
LCASE(n1.ProviderName) = LCASE(n2.ProviderName) and LCASE(n1.Address) =
LCASE(n2.Address)
and n1.StateId = n2.StateId
Upvotes: 0