Ding Ding
Ding Ding

Reputation: 306

How to remove duplicate items in MySQL with a dataset of 20 million rows?

I've got big MySQL database. I need to delete the duplicate item quickly. Here's how it looks:

id | text1 | text2|    
1  | 23    |  43  |   
2  | 23    |  44  |  
3  | 23    |  44  |

After the deleting, the remain part of table should be:

id | text1 | text2|   
1  | 23    |  43  |   
3  | 23    |  44  |

I don't care about the id. the most important is no duplicate items will be disappear.

Upvotes: 1

Views: 388

Answers (3)

Olexa
Olexa

Reputation: 587

DELETE FROM t WHERE id NOT IN
(SELECT MIN(id) FROM t GROUP BY text1, text2)

Upvotes: 1

halfer
halfer

Reputation: 20439

Run this:

SELECT COUNT(*), text1, text2
GROUP BY text1, text2
HAVING COUNT(*) > 1;

When you find rows here, delete one row for each match, and then run it again.

I'm not sure what it will be like in terms of performance - perhaps it doesn't matter, if you do this offline?

Upvotes: 0

Rahul Tripathi
Rahul Tripathi

Reputation: 172458

You may try this:

ALTER IGNORE TABLE my_tablename ADD UNIQUE INDEX idx_name (text1 , text2);

ie, try to add UNIQUE INDEX to your columns and alter the table

This has an advantage that in future also there will be no duplicate rows which you can insert in your table

Upvotes: 5

Related Questions