Reputation: 149
I have a Postgres database table (users) with the schema as shown below:
id user_id status
1 1 Green
2 2 Green
3 1 Blue
4 1 Green
A few unexpected concurrent insert queries are being executed on this table as below:
(We don't have control over query triggers)
17:38:31.927 -> INSERT INTO members (user_id, status) VALUES (1, 'Green');
17:38:31.927 -> INSERT INTO members (user_id, status) VALUES (1, 'Green');
This results in two new records added to the table:
id user_id status
1 1 Green
2 2 Green
3 1 Blue
4 1 Green
5 1 Green
6 1 Green
I want to avoid these duplicate records getting created. Also, the record with a user_id and status should not be created if the latest record from all the records with this same user_id is having this same status.
I cannot manage this logic at the code level as these are concurrent queries.
I have explored unique constraints and upsert in Postgres, but didn't find a solution for the above scenario.
EDIT -
Why cannot add a unique constraint on user_id and status?
Ans. There can be duplicate user_id and status. Suppose 'Green' is the latest status of user_id 1 (as shown in the first table) then we cannot insert status as 'Green' for the same user_id. But we can insert 'Blue' and then again we can insert 'Green', this way there will be two records with user_id as 1 and status as 'Green'. Similarly, there are multiple same queries getting triggered concurrently so if one of those queries executes first then other remaining queries should fail because they the have same user_id and status.
Upvotes: 1
Views: 606
Reputation: 3303
If when you added a unique constraint to the table, then this constraint automatically creates a unique index too. But, during the process of creating an index DB will check the all data in the entire table to detect if the data matches the generated index. When you want to set user_id
and status
fields to a unique, then values of these fields should be not duplicated on this table. For that, we must delete duplicate data for these fields.
I wrote a simple query for deleting duplicate data.
Example Query:
with dubl_data as materialized
(
select
id,
user_id,
status,
max(id) over (partition by user_id, status) as max_id,
count(id) over (partition by user_id, status) as count_id
from
test_table
)
delete from test_table t1
using dubl_data t2
where
t1.id = t2.id
and t2.count_id > 1
and t2.max_id <> t2.id;
I explain to you how this query works. count_id > 1
is a condition used for selecting only duplicate data. But max(id)
is used for calculating the max value of the id
field for every duplicate data and finally delete all data from the duplicate data different max(id) using this t2.max_id <> t2.id
condition.
Upvotes: 1