Reputation: 6351
I would like to know, in Postgres, how to delete all duplicate records but one by sorting by a column.
Let's say I have the following table foo
:
id | name | region | created_at
--------------------+------+-----------+-------------------------------
1 | foo | sydney | 2018-05-24 15:40:32.593745+10
2 | foo | melbourne | 2018-05-24 17:28:59.452225+10
3 | foo | sydney | 2018-05-29 22:17:02.927263+10
4 | foo | sydney | 2018-06-13 16:44:32.703174+10
5 | foo | sydney | 2018-06-13 16:45:01.324273+10
6 | foo | sydney | 2018-06-13 17:04:49.487767+10
7 | foo | sydney | 2018-06-13 17:05:13.592844+10
I would like to delete all duplicates by checking for (name, region) tuple, but keep the one with the greatest created_at
column. The result would be:
id | name | region | created_at
--------------------+------+-----------+-------------------------------
2 | foo | melbourne | 2018-05-24 17:28:59.452225+10
7 | foo | sydney | 2018-06-13 17:05:13.592844+10
But I do not know where to begin. Any ideas?
Upvotes: 2
Views: 385
Reputation: 44699
Use a subquery with ROW_NUMBER
and PARTITION BY
to filter out rows that have duplicate regions while retaining the most recent in each region. Ensure your subquery uses the AS
keyword to prevent Postgre syntax errors:
SELECT *
FROM foo
WHERE id IN (
SELECT a.id
FROM (
SELECT id, ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY created_at DESC
) row_no
FROM foo
) AS a
WHERE row_no > 1
);
... returns the rows to be deleted. Replace SELECT *
with DELETE
when you're satisfied with the result to delete the rows.
Upvotes: 1
Reputation: 172
DELETE FROM foo
WHERE id IN
(SELECT id
FROM (SELECT id,
ROW_NUMBER ()
OVER (PARTITION BY region
ORDER BY created_at DESC)
row_no
FROM foo)
WHERE row_no > 1)
Upvotes: 1