Yogzzz
Yogzzz

Reputation: 2785

Trouble deleting duplicate records in mysql

I have a products table which contains duplicate products by a column id_str and not id. We use the id_str to track each product. This is what I tried thus far:

Created a temp table and truncated it, then ran the following query

INSERT INTO products_temp SELECT DISTINCT id_str, id, title, url, image_url, long_descr, mp_seller_name, customer_rating, curr_item_price, base_item_price, item_num, rank, created_at, updated_at, published, publish_ready, categories, feed_id, category_names, last_published_at, canonical_url, is_curated, pr_attributes, gender, rating, stock_status, uploadedimage_file_name, updated_by, backfill_text, image_width, image_height, list_source, list_source_time, list_category, list_type, list_image, list_name, list_domain, notes, street_date, list_product_rank, created_by from products

And this moved everything over however when I searched the new table for duplicate id_str's:

SELECT id_str, COUNT(*) C FROM PRODUCTS GROUP BY id_str HAVING C > 1

I get the same result as I do on the original table. What am i missing?

Upvotes: 1

Views: 101

Answers (4)

Yogzzz
Yogzzz

Reputation: 2785

This is the simplest way I found to find and delete duplicates:

Note: Because of a bug with the InnoDB engine, for this to work you need to change your engine to MyISAM:

ALTER TABLE <table_name> ENGINE MyISAM

then add a unique index to the column you are trying to find dup's in using ignore:

ALTER IGNORE TABLE <table_name> ADD UNIQUE INDEX(`<column_name>`)

and change your db engine back:

ALTER TABLE <table_name> ENGINE InnoDB

and if you want you can delete the index you just created, but I would suggest also looking into what caused the duplicates in the first place.

Upvotes: 0

Alec Danyshchuk
Alec Danyshchuk

Reputation: 307

Try SELECT id_str, COUNT(*) C FROM PRODUCTS_TEMP GROUP BY id_str HAVING C > 1

In your case you are selecting again from the original table.

Upvotes: 0

Randy
Randy

Reputation: 16677

one or more of the other columns cause the rows being inserted to be unique. you are only testing the id_str in the count query,.

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838176

Using SELECT DISTINCT only removes duplicated entire rows. It doesn't remove a row if only one of the values is the same and the others are different.

Assuming that id is unique, try this instead:

 INSERT INTO products_temp
 SELECT id_str, id, title, url, -- etc
 FROM products
 WHERE id IN (SELECT MIN(id) FROM products GROUP BY id_str)

Upvotes: 0

Related Questions