Reputation: 11
I'm trying to find and remove MySQL row duplicates that are right after each other, instead of finding all, even if they're not straight after each other.
SELECT DISTINCT(content) AS contentMsg, COUNT(*) AS cnt, `ticketId`,`date`
FROM ticketsReplies
WHERE username = 'X'
GROUP BY contentMsg, ticketId
HAVING cnt > 1
ORDER BY cnt DESC
This is my current code. However, this finds duplicates if there's just two of the same answers in one ticket instead of them having to be IDs right after each other (which can happen if you send a POST request, and it fails, and you refresh etc).
How would I go about finding ones that are only 1 ID from each other.
So finding e.g. 1,2,3,4,5,6,7 instead of 1,3,9,11
E.g. if you have
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 bbb
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 bbb
If you have this, it should find the following IDs: 2,3,4 but not 9 as it's not directly after 4 even though its a duplicate.
It should also find 7,8 but not 1 as they are not right after each other.
Upvotes: 1
Views: 56
Reputation: 4157
If there are gaps in your id
list (eg 5, 6, 9, 11
), simply comparing id = id+1
wouldn't work. The solution I came up with is to create two identical temporary tables with sequential row-numbers. In that case you can safely compare the rows based on their number, even if the id's have gaps.
DELETE FROM tab WHERE id IN (
SELECT A.id
FROM
(
SELECT row_nr, id, email FROM (
SELECT
(@cnt1 := @cnt1 + 1) AS row_nr,
t.id,t.email
FROM tab AS t
CROSS JOIN (SELECT @cnt1 := 0) AS d
ORDER BY t.id
) x
) A
INNER JOIN
(
SELECT row_nr, id, email FROM (
SELECT
(@cnt2 := @cnt2 + 1) AS row_nr,
t.id,t.email
FROM tab AS t
CROSS JOIN (SELECT @cnt2 := 0) AS d
ORDER BY t.id
) x
) B
ON A.row_nr-1 = B.row_nr AND A.email=B.email
)
The two (SELECT row_nr, id, email FROM ... ) x
parts create two identical tables A and B like
row_nr id email
1 1 aaa
2 4 aaa
3 5 bbb
4 9 aaa
5 11 aaa
Then you can compare the sequential row-nr's and email:
ON A.row_nr-1 = B.row_nr AND A.email=B.email
Selecting the result-id's gives you the id's 4, 11
which are the duplicates. Then you can delete those id's:
DELETE FROM tab WHERE id IN ( ... )
Here is a Fiddle to test the SELECT part.
NOTE: Before you try this at home, please backup your table!
Upvotes: 0
Reputation: 33945
E.g.:
SELECT id
FROM
( SELECT x.id FROM my_table x JOIN my_table y ON y.email = x.email AND y.id = x.id + 1 ) a
UNION
( SELECT y.id FROM my_table x JOIN my_table y ON y.email = x.email AND y.id = x.id + 1 );
Upvotes: 1