Eric Cope
Eric Cope

Reputation: 877

Selecting Rows with Duplicate Data

I've got some MySQL tables with redundant data that I need to remove. For example:

 id email            date       data...
 1  [email protected] 2012-01-01 my_data
 2  [email protected] 2012-01-01 my_data
 3  [email protected] 2012-01-02 my_data
 4  [email protected] 2012-01-02 my_data   (redundant)
 5  [email protected] 2012-01-02 my_data

I need to DELETE the redundant rows, but I'd like to select them first. I found this on StackOverflow, but it requires the email address

SELECT * 
FROM `my_table`
WHERE `id` IN (SELECT `id` 
               FROM `my_table` 
               where `email` = '[email protected]' 
               group by `date` 
               HAVING count(*) > 1)

What query can i use like above that does not use the WHERE qualifier in the embedded query so I can do it fall all email addresses?

The query can be a SELECT query. I don't mind removing the rows manually in PHPMyAdmin.

Upvotes: 2

Views: 825

Answers (2)

GregD
GregD

Reputation: 2877

Another approach is to count the number of occurrences of the date column for each email address in your table:

SELECT `email`, `date`, COUNT(*) FROM `my_table` GROUP BY `date`, `email` HAVING COUNT(*) > 1

+------------------+---------------------+----------+
| email            | date                | COUNT(*) |
+------------------+---------------------+----------+
| [email protected] | 2012-01-02 00:00:00 |        2 |
+------------------+---------------------+----------+

Upvotes: 0

John Woo
John Woo

Reputation: 263933

DELETE FROM tableName
WHERE ID NOT IN
(
    SELECT minID
    FROM
    (
        SELECT email, date, MIN(id) minID
        FROM tableNAme
        GROUP BY email, date
    ) x
)

or by using JOIN

DELETE a 
FROM tableName a
    LEFT JOIN (
            SELECT minID
            FROM (
                    SELECT email, DATE, MIN(id) minID
                    FROM tableNAme
                    GROUP BY email, DATE
                    ) y
            ) x
            ON a.ID = x.minID
WHERE x.minID IS NULL;

The following query only SELECT duplicated rows for each email and date

SELECT a.*
FROM tableName a
        LEFT JOIN 
       ( 
         SELECT minID
        FROM
        (
          SELECT email, date, MIN(id) minID
          FROM tableNAme
          GROUP BY email, date
        )y
       ) x ON a.ID = x.minID
WHERE x.minID IS NULL

Upvotes: 7

Related Questions