Only adding non-duplicate values to PostgreSQL table

Question

I'm wondering what the best practice would be for avoiding duplication inside of a PostgreSQL database table. The rows in the table are equivalent to entries from a 3rd party API which is called several times throughout the day.

I am only adding entries that are not duplicates from the JSON response. The JSON response can be upwards of 500 objects in the database, lets say:

[{
  "unique_id": 1234,
  "name": "A",
  "wins": 1
},
{
  "unique_id": 1235
  "name": "B",
  "wins": 1
}, 
... 
];

The issue is that the data from the API may be data I've already stored away in the database and when I query the 3rd party endpoint again, I don't want to have the data that has already been inserted added again - only the unique objects in the result.

And I'm aware how I could check if one individual object exists as a row, but when I have upwards of 500 in the 3rd party response and only 50 of them will be unique, what would be the reasonable check for this in one database transaction?

I was tempted to go so far as to SELECT * FROM table_name, add those results to the json response, then remove duplicates that way, but that seems like an awful lot of overhead processing.

Any help would be appreciated.

GMB · Accepted Answer

A simple way to proceed is to declare a unique constraint (or a unique index, or a primary key) that references the relevant columns in the table. You can the use the on conflict do nothing clause in your insert queries:

Example:

create table mytable(unique_id int, name text, wins int);
create unique index myidx on mytable(unique_id, name, wins);

insert into mytable (unique_id, name, wins) 
values(1234, 'A', 1)
on conflict(unique_id, name, wins) do nothing;

Only adding non-duplicate values to PostgreSQL table

Answers (2)

Related Questions