David542
David542

Reputation: 110093

Removing redundant information, difficult SQL query

I have a table in the format:

`users`
- id
- views
- email

Here are some entries I have:

`id`        `email`                 `likes`
9570800   [email protected]      888
5355388   [email protected]     3042
8610061   [email protected]         450
6189810   [email protected] 0
6625684   [email protected]        0
16841675  [email protected]        40
9716153   [email protected]        0
9716353                              0
97161453                             0

I want to DELETE all entries that have a duplicate email and keep the entry with the highest likes. For example, in the above, the duplicates for [email protected] would mean 6625684 and 9716153 would both be deleted, and 16841675 would be kept (since it has the most likes out of the duplicates.

In addition, if the email is blank ('' or NULL), ignore it / do not delete it.

How would I accomplish this query?

Upvotes: 1

Views: 74

Answers (3)

000
000

Reputation: 27227

First, be careful. Review the list of potential deletions:

SELECT * FROM users
INNER JOIN (
  SELECT email, max(likes) max_likes FROM users GROUP BY email HAVING count(*) > 1
) keep ON users.email = keep.email
WHERE users.likes <> keep.max_likes

Then, if that looks okay to you, substitute SELECT * with DELETE users

Upvotes: 0

dfb
dfb

Reputation: 13289

DELETE FROM Users WHERE NOT userid in (
    SELECT userid FROM Users u 
        INNER JOIN (SELECT userid,max(likes) ml FROM Users GROUP BY userid) u2
           ON u.userid = u2.userid AND u.likes = u2.ml
    )

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562230

DELETE lesser FROM MyTable AS greater JOIN MyTable AS lesser USING (email)
WHERE greater.email <> '' AND greater.likes > lesser.likes;

Upvotes: 2

Related Questions