Reputation: 478
This is a topic, which has been discussed several times in here, but for me it's difficult to understand how to delete duplicated rows in our MySQL database. Yes, I have seen many examples, but they are simplified to 2 columns etc. and the solution is most of the time just query, which has not been explained. As I want to securely clean our database. I have backed up my table, and now I wan't to delete duplicated rows (as I have many of them). Below is a snippet of query, which I have tried, and it is fairly simple: if duplicated rows, pick latest row and show ID of the row.
SELECT MAX(id) FROM Raw_Validated_backup GROUP BY col1, col2, col3, ... col_n-1 having COUNT(*) > 1;
Now, as I have selected latest row of duplicated rows, I 'just' have to delete the duplicated rows. Easier said that done, if you ask me. This is a query written by @Jose Rui Santos:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
He replied to this post: MySQL delete duplicate records but keep latest
Question is, how do I exactly delete duplicated rows and keep the latest? Prefer to be in a pseudo like format, so others can learn from this as well. For example:
DELETE table FROM database
And not specific like:
DELETE email FROM city
MySQL Queries
EXAMPLE 1:
DELETE t1 FROM Raw_Validated_backup AS t1 INNER JOIN Raw_Validated_backup AS t2
ON t1.time_start=t2.time_start
AND t1.time_end=t2.time_end
AND t1.first_temp_lpn=t2.first_temp_lpn
AND t1.first_WL=t2.first_WL
AND t1.first_temp_lpn_validated=t2.first_temp_lpn_validated
AND t1.second_temp_lpn=t2.second_temp_lpn
AND t1.second_WL=t2.second_WL
AND t1.second_temp_lpn_validated=t2.second_temp_lpn_validated
AND t1.third_temp_lpn=t2.third_temp_lpn
AND t1.third_WL=t2.third_WL
AND t1.third_temp_lpn_validated=t2.third_temp_lpn_validated
AND t1.first_temp_rising=t2.first_temp_rising
AND t1.first_WR=t2.first_WR
AND t1.first_temp_rising_validated=t2.first_temp_rising_validated
AND t1.second_temp_rising=t2.second_temp_rising
AND t1.second_WR=t2.second_WR
AND t1.second_temp_rising_validated=t2.second_temp_rising_validated
AND t1.third_temp_rising=t2.third_temp_rising
AND t1.third_WR=t2.third_WR
AND t1.third_temp_rising_validated=t2.third_temp_rising_validated
AND t1.id<t2.id;
EXAMPLE 2:
DELETE FROM Raw_Validated_backup WHERE id NOT IN (
SELECT max(id) FROM Raw_Validated_backup
GROUP BY time_start, time_end, first_temp_lpn, first_WL, first_temp_lpn_validated, second_temp_lpn, second_WL, second_temp_lpn_validated, third_temp_lpn, third_WL, third_temp_lpn_validated, first_temp_rising, first_WR, first_temp_rising_validated, second_temp_rising, second_WR, second_temp_rising_validated, third_temp_rising, third_WR, third_temp_rising_validated )
Upvotes: 1
Views: 997
Reputation: 2063
The final solution in @Jose Rui Santos's answer will do exactly what you need.
DELETE
FROM table_name
WHERE id NOT IN (
SELECT max(id)
FROM table_name
GROUP BY email
)
It will select all the IDs of the latest records for each group (the grouping being done on email in this case, but it can be on all of your columns as well), and then delete all the records that are not in the result of the query.
You can make sure it is, in fact, going to delete the right results by first just running the select:
SELECT max(id)
FROM table_name
GROUP BY email
This will show you everything that won't be deleted.
Upvotes: 0
Reputation: 562250
I'd do it this way:
mysql> select * from test;
+----+------------------+------+------+------+
| id | email | col2 | col3 | col4 |
+----+------------------+------+------+------+
| 1 | [email protected] | 2 | 3 | 4 |
| 2 | [email protected] | 2 | 3 | 4 |
| 3 | [email protected] | 2 | 3 | 4 |
| 4 | [email protected] | 2 | 3 | 4 |
+----+------------------+------+------+------+
mysql> DELETE t1 FROM test AS t1 INNER JOIN test AS t2
ON t1.email=t2.email AND t1.col2=t2.col2 AND t1.col3=t2.col3 AND t1.col4=t2.col4
AND t1.id<t2.id;
mysql> select * from test;
+----+------------------+------+------+------+
| id | email | col2 | col3 | col4 |
+----+------------------+------+------+------+
| 4 | [email protected] | 2 | 3 | 4 |
+----+------------------+------+------+------+
Explanation: The condition in the join looks for a row t2
such that the id is greater, and all other columns are equal. If such a row is found, then t1
must not be the row with the greatest id among the duplicates, therefore row t1
should be deleted.
Upvotes: 2