Webdev Tory
Webdev Tory

Reputation: 515

Postgres remove composite duplicates

In my populated database I have a table originally with this schema:

CREATE TABLE tale_references (
    id SERIAL PRIMARY KEY,
    creation_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tale_id TEXT REFERENCES tale_catalog(tale_id) ON DELETE CASCADE,
    reference_id INT REFERENCES FT_References(id) ON DELETE CASCADE,
    tale_catalog_id INT REFERENCES tale_catalog(id) ON DELETE CASCADE,
    other_info JSONB);

I need to add to it the following line:

CONSTRAINT no_duplicate_tale UNIQUE (tale_catalog_id, reference_id))

However, the data already in the database has entries that break this constraint. How can I find these entries so that I can remove them?

Upvotes: 2

Views: 1049

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270713

I would do this as:

delete from tale_references t
    where exists (select 1
                  from tale_references t2
                  where t2.tale_catalog_id = t.tale_catalog_id and
                        t2.reference_id = t.reference_id and
                        t2.id < t.id
                 );

This deletes any row where the tale_catalog_id/reference_id pair has a smaller value. In other words, it keeps the smallest value for each pair.

Upvotes: 4

Amith Kumar
Amith Kumar

Reputation: 4902

Very easy way is to use EXISTS clause. Here is the query I formulated for you:

DELETE
FROM tale_references
WHERE id IN
  ( SELECT t1.id
  FROM tale_references t1
  WHERE EXISTS
    (SELECT 1
    FROM tale_references t2
    WHERE t1.id           <> t2.id
    AND t1.tale_catalog_id = t2.tale_catalog_id
    AND t1.reference_id    = t2.reference_id
    )
  );

Upvotes: 0

Related Questions