Reputation: 3740
I'm wondering what the best practice would be for avoiding duplication inside of a PostgreSQL database table. The rows in the table are equivalent to entries from a 3rd party API which is called several times throughout the day.
I am only adding entries that are not duplicates from the JSON response. The JSON response can be upwards of 500 objects in the database, lets say:
[{
"unique_id": 1234,
"name": "A",
"wins": 1
},
{
"unique_id": 1235
"name": "B",
"wins": 1
},
...
];
The issue is that the data from the API may be data I've already stored away in the database and when I query the 3rd party endpoint again, I don't want to have the data that has already been inserted added again - only the unique objects in the result.
And I'm aware how I could check if one individual object exists as a row, but when I have upwards of 500 in the 3rd party response and only 50 of them will be unique, what would be the reasonable check for this in one database transaction?
I was tempted to go so far as to SELECT * FROM table_name
, add those results to the json response, then remove duplicates that way, but that seems like an awful lot of overhead processing.
Any help would be appreciated.
Upvotes: 3
Views: 1607
Reputation: 53735
Set a UNIQUE constraint on the column that needs to be unique, and then insert with ON CONFLICT DO NOTHING (or ON CONFLICT DO UPDATE
, if you didn't know that was an option, but is what you'd actually want to do).
Upvotes: 1
Reputation: 222722
A simple way to proceed is to declare a unique constraint (or a unique index, or a primary key) that references the relevant columns in the table. You can the use the on conflict do nothing
clause in your insert
queries:
Example:
create table mytable(unique_id int, name text, wins int);
create unique index myidx on mytable(unique_id, name, wins);
insert into mytable (unique_id, name, wins)
values(1234, 'A', 1)
on conflict(unique_id, name, wins) do nothing;
Upvotes: 2