Reputation: 171
I have a table > 500,000 rows where there are duplicate entries that need to be removed but only where a few columns match.
The main table has the following columns
id,
countryID,
postalCode,
adminName1,
adminName2,
placeName,
adminName3,
latitude,
longitude
I need to remove duplicates (leaving the first record) where placeName, latitude & longitude match
I had a search and found this which looks right but doesnt work for me. I have duplicated the original table structure into a new table (tblTemp)
INSERT INTO tblTemp(id,countryID,postalCode,adminName1,adminName2,placeName,adminName3,latitude,longitude)
SELECT DISTINCT placeName,latitude,longitude
FROM tblCountry_admin;
But i get the error
Column count doesn't match value count at row 1
Upvotes: 0
Views: 1787
Reputation: 133410
if you want delete the duplicate row by laceName, latitude & longitude leaving the the old one you could
You could check for select the duplicated rows this wat
select * from tblCountry_admin
where id not in (
select min(id)
from tblCountry_admin
group by placename, latitude, longitude
)
then you could delete this way
delete from tblCountry_admin
where id not in (
select min(id)
from tblCountry_admin
group by placename, latitude, longitude
)
the error you get in you insert select id due by the fact the number of column in insert don't match the number of column is select
Upvotes: 1
Reputation: 39537
Assuming the tblTemp
has same set of columns as tblCountry_admin
and you want to get the rows with max id in case of duplicates, you can use this:
INSERT INTO tblTemp
select a.*
from tblCountry_admin a left join tblCountry_admin b on a.placeName = b.placeName
and a.latitude = b.latitude
and a.longitude = b.longitude
and a.id < b.id
where b.id is null;
If you want to create the table using the select use:
create table tblTemp as
select a.*
from tblCountry_admin a left join tblCountry_admin b on a.placeName = b.placeName
and a.latitude = b.latitude
and a.longitude = b.longitude
and a.id < b.id
where b.id is null;
Upvotes: 0
Reputation: 1271171
Use WHERE
:
INSERT INTO tblTemp (id, countryID, postalCode, adminName1,adminName2,
placeName, adminName3, latitude, longitude)
SELECT id, countryID, postalCode, adminName1, adminName2,
placeName, adminName3, latitude, longitude
FROM tblCountry_admin a
WHERE a.id = (SELECT MIN(a2.id)
FROM tblCountry_admin a2
WHERE a2.placeName = a.placeName AND
a2.latitude = a.latitude AND
a2.longitude = a.longitude
);
Upvotes: 0