Drakron
Drakron

Reputation: 95

How to remove duplicated rows in Postgres by two fields

I need to remove only duplicated rows in a Postgres database, but I'm having some dificult to do this. I have some data rows that are duplicated and I need to remove them and keep only one by removing only the duplicated rows.

Example:

TableName="RegionCustomer"

Row - RegionId - CustomerId
 1  - REG1000  - CUS3000 
 2  - REG1000  - CUS4000
 3  - REG1000  - CUS3000
 4  - REG2000  - CUS3000

You can see in the row 1 and 3 that the values RegionId and CustomerId are duplicated, and I want to delete them. But i dont know how to do this :/

I have tried to group them by RegionId and CustomerId and this works, but don't know how to delete them.

[OriginalTable]

Row - RegionId - CustomerId
 1  - REG1000  - CUS3000 
 2  - REG1000  - CUS4000
 3  - REG1000  - CUS3000
 4  - REG2000  - CUS3000

[WhatINeed]

Row - RegionId - CustomerId
 1  - REG1000  - CUS3000 
 2  - REG1000  - CUS4000
 3  - REG2000  - CUS3000

Upvotes: 1

Views: 52

Answers (1)

A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38502

You can delete this way the duplicate rows by keeping single row of the RegionId and CustomerId combination,

What it does:

  • Rank the rows based on RegionId, CustomerId group a.k.a the window
  • Select only the id column of each window where the row_number is > 1
  • Delete only those rows from the RegionCustomer table.

QUERY:

DELETE FROM RegionCustomer
WHERE id IN
    (
      SELECT id
      FROM 
        (
         SELECT id,
         ROW_NUMBER() OVER( PARTITION BY RegionId,
         CustomerId
         ORDER BY  id ) AS row_num
         FROM RegionCustomer ) t
         WHERE t.row_num > 1 
    );

Note: Replace id with Row if that column name is Row instead of id,

Upvotes: 4

Related Questions