Reputation: 1741
I have a table with 28 million records but now it has 56 million records because I assumed the load local infile
command would ignore rows that were already in the table. No I need a way to efficiently remove the duplicate rows. What is the best way to approach this?
If I do not want to touch my table can I just select unique rows by this statement:
select distinct (l1.lat, l2.lon) from A, B;
Upvotes: 1
Views: 155
Reputation: 2575
If you cannot touch the table, and have to use it, why don't you create a view which only show you distinct records?
Upvotes: 0
Reputation: 3493
Select originals into a new/temp table, delete the 56 million records, insert your originals.
Example: INSERT INTO new_fresh_table SELECT a, b, c, d FROM table_with_dupes GROUP BY a, b, c, d
If you've lost duped your IDs somehow (not sure how that's possible with a PK), you need to use GROUP BY
on every single column. Write a SELECT
against meta-data to write your SELECT
for you.
Upvotes: 5
Reputation: 6062
You didn't specify how the records are duped. Is it Primary Key? Name? What?
From O'Reily's SQL Cookbook (highly recommended, even for SQL pros):
delete from dupes
where id not in ( select min(id) from dupes group by name )
Upvotes: 0