The Doe
The Doe

Reputation: 11

Find and remove MySQL row duplicates that are right after each other

I'm trying to find and remove MySQL row duplicates that are right after each other, instead of finding all, even if they're not straight after each other.

SELECT DISTINCT(content) AS contentMsg, COUNT(*) AS cnt, `ticketId`,`date` 
FROM ticketsReplies 
WHERE username = 'X' 
GROUP BY contentMsg, ticketId 
HAVING cnt > 1 
ORDER BY cnt DESC

This is my current code. However, this finds duplicates if there's just two of the same answers in one ticket instead of them having to be IDs right after each other (which can happen if you send a POST request, and it fails, and you refresh etc).

How would I go about finding ones that are only 1 ID from each other.

So finding e.g. 1,2,3,4,5,6,7 instead of 1,3,9,11

E.g. if you have

ID                     EMAIL                
---------------------- -------------------- 
1                      aaa                  
2                      bbb                  
3                      bbb                  
4                      bbb                  
5                      ddd                  
6                      eee                  
7                      aaa                  
8                      aaa                  
9                      bbb

If you have this, it should find the following IDs: 2,3,4 but not 9 as it's not directly after 4 even though its a duplicate.

It should also find 7,8 but not 1 as they are not right after each other.

Upvotes: 1

Views: 56

Answers (2)

Michel
Michel

Reputation: 4157

If there are gaps in your id list (eg 5, 6, 9, 11), simply comparing id = id+1 wouldn't work. The solution I came up with is to create two identical temporary tables with sequential row-numbers. In that case you can safely compare the rows based on their number, even if the id's have gaps.

DELETE FROM tab WHERE id IN (

SELECT A.id
FROM 
    (
    SELECT row_nr, id, email FROM (
        SELECT
            (@cnt1 := @cnt1 + 1) AS row_nr,
            t.id,t.email
        FROM tab AS t
          CROSS JOIN (SELECT @cnt1 := 0) AS d
        ORDER BY t.id
        ) x
    ) A
INNER JOIN 
    (
    SELECT row_nr, id, email FROM (
        SELECT
            (@cnt2 := @cnt2 + 1) AS row_nr,
            t.id,t.email
        FROM tab AS t
          CROSS JOIN (SELECT @cnt2 := 0) AS d
        ORDER BY t.id
        ) x
    ) B
ON A.row_nr-1 = B.row_nr AND A.email=B.email

)

The two (SELECT row_nr, id, email FROM ... ) x parts create two identical tables A and B like

row_nr   id  email
 1       1   aaa
 2       4   aaa
 3       5   bbb
 4       9   aaa
 5       11  aaa

Then you can compare the sequential row-nr's and email:

ON A.row_nr-1 = B.row_nr AND A.email=B.email

Selecting the result-id's gives you the id's 4, 11 which are the duplicates. Then you can delete those id's:

DELETE FROM tab WHERE id IN ( ... )

Here is a Fiddle to test the SELECT part.

NOTE: Before you try this at home, please backup your table!

Upvotes: 0

Strawberry
Strawberry

Reputation: 33945

E.g.:

SELECT id 
  FROM 
     ( SELECT x.id  FROM my_table x JOIN my_table y ON y.email = x.email AND y.id = x.id + 1 ) a 
 UNION 
     ( SELECT y.id FROM my_table x JOIN my_table y ON y.email = x.email AND y.id = x.id + 1 );

Upvotes: 1

Related Questions