Paul Grand
Paul Grand

Reputation: 85

SQL Delete duplicates - Keep Deleted IDs

I need to find and delete duplicates from within a table, yet keep a record of the IDs of the records being deleted and add them to a 'deleted_ids' field.

For example, here's part of a table I have currently:

+--------+-------------------------------------------------------+
| Id     | Name                                                  |
+--------+-------------------------------------------------------+
|   9627 |  John Smith                                           |
|   9657 |  John Smith                                           |
|  24455 |  John Smith                                           |
|  67435 |  John Smith                                           |
+--------+-------------------------------------------------------+

The result I'd like to achieve is:

+--------+-----------------------------+-------------------------+
| Id     | Name                        | Deleted_Ids             |
+--------+-----------------------------+-------------------------+
|   9627 |  John Smith                 | 9657,24455,67435        |
+--------+-----------------------------+-------------------------+

I'm open to any practical approach, if anyone can provide a solution or give suggestions I'd really appreciate it.

Thanks, Paul

Upvotes: 2

Views: 126

Answers (3)

Zohaib
Zohaib

Reputation: 7116

The basic approach can be to first select all rows from old table, that qualifies the deletion criteria -> Insert these rows into new table -> delete all rows from old table

INSERT INTO new_table
SELECT id,name FROM old_table WHERE id > 
(
  SELECT min(id) FROM old_table B
  WHERE A.name = B.name
);

delete FROM old_table WHERE id > 
(
  SELECT min(id) FROM old_table B
  WHERE A.name = B.name
);

you can have auto incremented key in new table. I have not run these queries, pardon me if here are some errors, thats just for an idea. Having one to many relationship within these two tables can really be handy in future....and easier to query the data

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656586

If you want to proceed like this, here is a simple way.
Do it all in one transaction to be sure.
If there could be an entry of 'John Smith' in tbl_del already, you have to check and UPDATE to add the deleted ids instead.

BEGIN;

INSERT INTO tbl_del
SELECT name, GROUP_CONCAT(id) AS deleted
FROM   tbl_live
WHERE  name = 'John Smith'
GROUP  BY 1;

DELETE FROM tbl_live
WHERE  name = 'John Smith';

COMMIT;

But I think Veseliq has a vaid point.

Upvotes: 2

ddinchev
ddinchev

Reputation: 34673

Don't do it with another column - storing IDs as comma separated list doesn't feel right, does it?

Best would be to have a second table with two columns PrimaryUserId and DeletedUserId - the PrimaryUserId would be "9627" for example - the one of the record you actually keep and you will add a row for every duplicate you delete from your main table. This approach would also allow for you to store a column that knows when the duplicate entry was deleted.

Upvotes: 4

Related Questions