mc9
mc9

Reputation: 6351

Postgres delete all duplicate records but one by sorting

I would like to know, in Postgres, how to delete all duplicate records but one by sorting by a column.

Let's say I have the following table foo:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  1 | foo  | sydney    | 2018-05-24 15:40:32.593745+10
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  3 | foo  | sydney    | 2018-05-29 22:17:02.927263+10
                  4 | foo  | sydney    | 2018-06-13 16:44:32.703174+10
                  5 | foo  | sydney    | 2018-06-13 16:45:01.324273+10
                  6 | foo  | sydney    | 2018-06-13 17:04:49.487767+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

I would like to delete all duplicates by checking for (name, region) tuple, but keep the one with the greatest created_at column. The result would be:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

But I do not know where to begin. Any ideas?

Upvotes: 2

Views: 385

Answers (2)

esqew
esqew

Reputation: 44699

Use a subquery with ROW_NUMBER and PARTITION BY to filter out rows that have duplicate regions while retaining the most recent in each region. Ensure your subquery uses the AS keyword to prevent Postgre syntax errors:

SELECT * 
FROM foo 
WHERE id IN (
  SELECT a.id 
  FROM (
    SELECT id, ROW_NUMBER() OVER (
        PARTITION BY region 
        ORDER BY created_at DESC
    ) row_no 
    FROM foo
  ) AS a 
  WHERE row_no > 1
);

... returns the rows to be deleted. Replace SELECT * with DELETE when you're satisfied with the result to delete the rows.

SQLFiddle demo

Upvotes: 1

AmeyaN99
AmeyaN99

Reputation: 172

DELETE FROM foo
      WHERE id IN
               (SELECT id
                  FROM (SELECT id,
                               ROW_NUMBER ()
                               OVER (PARTITION BY region
                                     ORDER BY created_at DESC)
                                  row_no
                          FROM foo)
                 WHERE row_no > 1)

Upvotes: 1

Related Questions