SQL - Redshift remove duplicate rows without primary key

Question

I have a big redshift database. the records do not have unique key.

I'd like to convert the below sql to postgresql.

DELETE FROM (
SELECT *,ROW_NUMBER() OVER (partition BY column1, column2, column3) AS rnum
FROM tablename ) t1
WHERE t1.rnum > 1;

perform the above sql,the following errors appear :

Amazon Invalid operation: syntax error at or near "(";

please :'( .

Gordon Linoff · Accepted Answer

In Postgres, you can do this using ctid. This is a system "column" that physically identifies each row.

The idea is:

delete from tablename
    where ctid not in (select min(t2.ctid)
                       from tablename t2
                       group by column1, column2, column3
                      );

I am not sure if Redshift supports ctid. But then again, despite the tags, your question is explicitly about Postgres.

SQL - Redshift remove duplicate rows without primary key

Answers (1)

Related Questions