Reputation: 363
I have the following table structure :
I would like to be able to delete from the table the duplicated mails, leaving for each mail account only the one with the highest quality score. At the moment I have come up with the following SQL code :
DELETE *
FROM Table
WHERE ( Table.[Email Adress] & Table.[Quality Score] ) NOT IN
(
SELECT (Table.[Email Adress] & Max(Table.[Quality Score])
FROM Table
GROUP BY Table.[Email Adress]
);
However when I run it, it asks me for a parameter value and clearly doesn't work as I intended.
Do you have any solution?
Upvotes: 0
Views: 419
Reputation: 164064
You can simplify your query to this:
DELETE FROM Table AS t
WHERE t.[Quality Score] <> (
SELECT Max([Quality Score])
FROM Table
WHERE [Email Adress] = t.[Email Adress]
);
No need to GROUP BY [Email Adress]
but you need a WHERE
clause.
Or with EXISTS
:
DELETE FROM Table AS t
WHERE EXISTS (
SELECT 1 FROM Table
WHERE [Email Adress] = t.[Email Adress] AND [Quality Score] > t.[Quality Score]
);
In case there are duplicate scores then you can keep the row with the highest score and the lowest id like this:
DELETE FROM Table AS t
WHERE EXISTS (
SELECT 1 FROM Table
WHERE [Email Adress] = t.[Email Adress]
AND ([Quality Score] > t.[Quality Score] OR ([Quality Score] = t.[Quality Score] AND id < t.id))
);
Upvotes: 1
Reputation: 1269493
One method uses a correlated subquery:
delete from t
where t.quality_score < (select max(t2.quality_score)
from t as t2
where t2.email_address = t.email_address
);
Note: If you have duplicate highest scores, this keeps all of them. To address, you can use the id
column:
delete from t
where t.id <> (select top 1 t2.id
from t as t2
where t2.email_address = t.email_address
order by t2.quality_score desc, id
);
Upvotes: 1