Reputation: 85
I need to find and delete duplicates from within a table, yet keep a record of the IDs of the records being deleted and add them to a 'deleted_ids' field.
For example, here's part of a table I have currently:
+--------+-------------------------------------------------------+
| Id | Name |
+--------+-------------------------------------------------------+
| 9627 | John Smith |
| 9657 | John Smith |
| 24455 | John Smith |
| 67435 | John Smith |
+--------+-------------------------------------------------------+
The result I'd like to achieve is:
+--------+-----------------------------+-------------------------+
| Id | Name | Deleted_Ids |
+--------+-----------------------------+-------------------------+
| 9627 | John Smith | 9657,24455,67435 |
+--------+-----------------------------+-------------------------+
I'm open to any practical approach, if anyone can provide a solution or give suggestions I'd really appreciate it.
Thanks, Paul
Upvotes: 2
Views: 126
Reputation: 7116
The basic approach can be to first select all rows from old table, that qualifies the deletion criteria -> Insert these rows into new table -> delete all rows from old table
INSERT INTO new_table
SELECT id,name FROM old_table WHERE id >
(
SELECT min(id) FROM old_table B
WHERE A.name = B.name
);
delete FROM old_table WHERE id >
(
SELECT min(id) FROM old_table B
WHERE A.name = B.name
);
you can have auto incremented key in new table. I have not run these queries, pardon me if here are some errors, thats just for an idea. Having one to many relationship within these two tables can really be handy in future....and easier to query the data
Upvotes: 1
Reputation: 656586
If you want to proceed like this, here is a simple way.
Do it all in one transaction to be sure.
If there could be an entry of 'John Smith'
in tbl_del
already, you have to check and UPDATE
to add the deleted ids instead.
BEGIN;
INSERT INTO tbl_del
SELECT name, GROUP_CONCAT(id) AS deleted
FROM tbl_live
WHERE name = 'John Smith'
GROUP BY 1;
DELETE FROM tbl_live
WHERE name = 'John Smith';
COMMIT;
But I think Veseliq has a vaid point.
Upvotes: 2
Reputation: 34673
Don't do it with another column - storing IDs as comma separated list doesn't feel right, does it?
Best would be to have a second table with two columns PrimaryUserId and DeletedUserId - the PrimaryUserId would be "9627" for example - the one of the record you actually keep and you will add a row for every duplicate you delete from your main table. This approach would also allow for you to store a column that knows when the duplicate entry was deleted.
Upvotes: 4