Reputation: 51
I am trying to delete duplicate names under certain criteria:
Not the best with SQL but have cobbled this together.
There's only a combined primary key combining first_name and another random column. So ctid is used as a reference to the row from my understanding. I group reoccurring names where task is null and then select only the latest of each group with MAX(date_started). I delete all duplicate rows (a.name = b.name) meeting the criteria that task is null (a.task IS NULL) except the one I had selected by using a.ctid <> b.ctid (b.ctid being the selected newest row by date_started). At least this is what I was hoping to do. It's not actually working.
DELETE FROM player_assignments a
USING (SELECT MAX(date_started) as ctid, date_started, name, task
FROM player_assignments WHERE task IS NULL GROUP BY name HAVING COUNT(*) > 1) b
WHERE a.name = b.name AND a.ctid <> b.ctid AND a.task IS NULL
With different minor tweaks, I keep getting two errors in particular.
psycopg2.errors.GroupingError: column "player_assignments.date_started" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: ...ING (SELECT MAX(date_started) as ctid, da...
psycopg2.errors.UndefinedFunction: operator does not exist: tid <> timestamp with time zone LINE 1: ...me = name AND a.ctid <> b.ctid ...
Upvotes: 1
Views: 101
Reputation: 1270421
I think I would approach this using a correlated subquery:
delete from player_assignments pa
where pa.task is null and
pa.date_started < (select max(pa2.date_started)
from player_assignments pa2
where pa2.name = pa.name and
pa2.task is null
);
Note: This assumes that you want to keep one row with null
even if other rows with non-NULL
exist for the name. If you want to only keep NULL
rows when it is the most recent among all rows, then remove the pa2.task is null
from the subquery.
Upvotes: 2
Reputation: 10212
Try this one:
DELETE FROM player_assignments a
USING (
SELECT
name,
MAX(date_started) as ctid
FROM player_assignments
WHERE task IS NULL
GROUP BY name
HAVING COUNT(*) > 1
) b
WHERE a.name = b.name
AND a.date_started <> b.ctid
AND a.task IS NULL
Upvotes: 1