cool_cs
cool_cs

Reputation: 1741

Efficiently Removing duplicate rows

I have a table with 28 million records but now it has 56 million records because I assumed the load local infile command would ignore rows that were already in the table. No I need a way to efficiently remove the duplicate rows. What is the best way to approach this?

If I do not want to touch my table can I just select unique rows by this statement:

select distinct (l1.lat, l2.lon) from A, B;

Upvotes: 1

Views: 155

Answers (3)

Farhan
Farhan

Reputation: 2575

If you cannot touch the table, and have to use it, why don't you create a view which only show you distinct records?

Upvotes: 0

bluevector
bluevector

Reputation: 3493

Select originals into a new/temp table, delete the 56 million records, insert your originals.

Example: INSERT INTO new_fresh_table SELECT a, b, c, d FROM table_with_dupes GROUP BY a, b, c, d

If you've lost duped your IDs somehow (not sure how that's possible with a PK), you need to use GROUP BY on every single column. Write a SELECT against meta-data to write your SELECT for you.

Upvotes: 5

BryanH
BryanH

Reputation: 6062

You didn't specify how the records are duped. Is it Primary Key? Name? What?

From O'Reily's SQL Cookbook (highly recommended, even for SQL pros):

delete from dupes
  where id not in ( select min(id) from dupes group by name )

Upvotes: 0

Related Questions