Atif
Atif

Reputation: 10880

Deleting Duplicates in MySQL

I have a table like this

userid  visitorid   time
1       10          2009-12-23
1       18          2009-12-06
1       18          2009-12-14
1       18          2009-12-18
1705    1678        2010-01-24
1705    1699        2010-01-24
1705    1700        2010-01-24
1712    1           2010-01-25
1712    640         2010-01-24
1712    925         2010-01-25
1712    1600        2010-01-24
1712    1630        2010-01-25
1712    1630        2010-01-24
1713    1           2010-01-24
1713    1           2010-01-23

I would like to perform a query such that it removes all the duplicates except for the latest one. I Hope you get an idea?

Example, after the query the table must be like this

userid  visitorid   time
1       10          2009-12-23
1       18          2009-12-18
1705    1678        2010-01-24
1705    1699        2010-01-24
1705    1700        2010-01-24
1712    1           2010-01-25
1712    640         2010-01-24
1712    925         2010-01-25
1712    1600        2010-01-24
1712    1630        2010-01-25
1713    1           2010-01-24

Upvotes: 2

Views: 371

Answers (4)

bignose
bignose

Reputation: 32367

You need to work around MySQL bug#6980, with a doubly nested subquery:

DELETE FROM foo_table
WHERE foo_table.time IN (
    SELECT time FROM (
        SELECT time FROM
            foo_table
            LEFT OUTER JOIN (
                SELECT MAX(time) AS time
                FROM foo_table
                GROUP BY userid, visitorid
                ) AS foo_table_keep
                    USING (time)
        WHERE
            foo_table_keep.time IS NULL
        ) AS foo_table_delete
    );

Using GROUP BY collapses duplicates down to a single row, and MAX(time) chooses which value you want. Use another aggregate function than MAX if you want.

Wrapping the subquery twice, providing aliases for each, avoids the error:

ERROR 1093 (HY000): You can't specify target table 'foo_table' for update in FROM clause

and has the extra advantage that it's clearer how the statement is choosing what to keep.

Upvotes: 0

soulmerge
soulmerge

Reputation: 75774

Assuming your table is called Visitors:

DELETE v1.* FROM Visitors v1
LEFT JOIN (
    SELECT userid, visitorid, MAX(time) AS time
    FROM Visitors v2
    GROUP BY userid, visitorid
) v3 ON v1.userid=v3.userid AND v1.visitorid=v3.visitorid AND v1.time = v3.time
WHERE v3.userid IS NULL;

Upvotes: 0

DRapp
DRapp

Reputation: 48179

Delete from YourTable VersionA
  where VersionA.Time NOT IN
    ( select MAX( VersionB.Time ) Time
         from YourTable VersionB
         where VersionA.UserID = VersionB.UserID
           and VersionA.VisitorID = VersionB.VisitorID )

Syntax might need to be adjusted, but SHOULD do the trick. Additionally, you may want to pre-query the Subselect into its own table FIRST, then run the DELETE FROM against that result set.

Upvotes: 4

Quassnoi
Quassnoi

Reputation: 425813

DELETE  mo.*
FROM    (
        SELECT  userid, visitorid, MAX(time) AS mtime
        FROM    mytable
        GROUP BY
                userid, visitorid
        ) mi
JOIN    mytable mo
ON      mo.userid = mi.userid
        AND mo.visitorid = mo.visitorid
        AND mo.time < mi.mtime

Upvotes: 0

Related Questions