Postgres delete all duplicate records but one by sorting

Question

I would like to know, in Postgres, how to delete all duplicate records but one by sorting by a column.

Let's say I have the following table foo:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  1 | foo  | sydney    | 2018-05-24 15:40:32.593745+10
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  3 | foo  | sydney    | 2018-05-29 22:17:02.927263+10
                  4 | foo  | sydney    | 2018-06-13 16:44:32.703174+10
                  5 | foo  | sydney    | 2018-06-13 16:45:01.324273+10
                  6 | foo  | sydney    | 2018-06-13 17:04:49.487767+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

I would like to delete all duplicates by checking for (name, region) tuple, but keep the one with the greatest created_at column. The result would be:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

But I do not know where to begin. Any ideas?

esqew · Accepted Answer

Use a subquery with ROW_NUMBER and PARTITION BY to filter out rows that have duplicate regions while retaining the most recent in each region. Ensure your subquery uses the AS keyword to prevent Postgre syntax errors:

SELECT * 
FROM foo 
WHERE id IN (
  SELECT a.id 
  FROM (
    SELECT id, ROW_NUMBER() OVER (
        PARTITION BY region 
        ORDER BY created_at DESC
    ) row_no 
    FROM foo
  ) AS a 
  WHERE row_no > 1
);

... returns the rows to be deleted. Replace SELECT * with DELETE when you're satisfied with the result to delete the rows.

SQLFiddle demo

Postgres delete all duplicate records but one by sorting

Answers (2)

Related Questions