Reputation: 11499
I have a column tracking unique user_id
by project (for external use).
I want to increment the user_id
column when creating a new user, subject to whatever the count is at for that project. There are many existing records already, but from here on out we want the user_id
to increment up by 1 for each new record in a given project_id
.
id user_id project_id
-------------------------
1 100 1
2 101 1
3 1000 2
4 1001 2
5 17 3
6 18 3
7 102 1
New row with project_id = 1 should use user_id = 103
New row with project_id = 2 should use user_id = 1002
New row with project_id = 3 should use user_id = 19
How can I construct the user_id
column and/or INSERT
query such that it will always increment the user_id
based on the largest existing user_id
within the corresponding project_id
, and guarantee that no two users in the same project are assigned the same user_id
upon concurrent inserts?
Upvotes: 2
Views: 2747
Reputation: 32693
The straightforward way to guarantee that no two users in the same project are assigned the same user_id
upon concurrent inserts is to prevent concurrent activity.
One way to achieve it is to set transaction isolation level to Serializable.
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
-- here I took the query from George Joseph's answer
insert into user_table
(user_id, project_id)
select
coalesce(max(user_id), 0) + 1 as user_id
,@project_id as project_id
from
user_table
where
project_id=@project_id
COMMIT TRANSACTION
You can run this query block from several sessions simultaneously and the engine will handle concurrency behind the scenes. I don't really know the details how Postgres does it. Most likely concurrent transactions will wait for the previous to finish.
For this to work efficiently you'll need an index on (project_id, user_id)
.
You also need to make it unique to enforce your constraint. The order of columns in this index is important.
You also mentioned that you expect thousands of projects, and eventually up to millions of users per project. This adds up to a billion rows, which is quite a lot to run MAX
against for each insert. Even with appropriate index.
You can create a separate table project_sequences
to store the last value of user_id
for each project_id
. This table will have two columns project_id
and last_user_id
with primary key on both of them (project_id, last_user_id)
. The order of columns in index is important.
Now you can query and update the small table project_sequences
with only 1000 rows for each insert in the main large table. I'm not familiar with Postgres syntax for variables, so below is pseudo-code.
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
-- read the last_user_id for the given project_id from the small table into a variable
-- and increment it
-- update the small table with the new last_user_id
-- use the freshly generated user_id to insert into the main table
-- or, without variables
-- increment the last_user_id
update project_sequences
set last_user_id =
(
select coalesce(max(last_user_id), 0) + 1
from project_sequences
where project_id=@project_id
)
where
project_id=@project_id
-- use the new id to insert into the main table
insert into user_table
(user_id, project_id)
select
last_user_id
,@project_id as project_id
from
project_sequences
where
project_id=@project_id
COMMIT TRANSACTION
With variables it will be a bit easier to handle the case when the given project_id
is the new one, that doesn't exist in the table yet and set the new user_id
to start with 1, or whatever starting value you need.
Upvotes: 3
Reputation: 1881
I suggest using a trigger before the insert, that way your 99.99% shure you wont have duplicates and holes in the sequence (101,102,missing,111,112).
The problem with sequences is that if not used carefully, you can lose control of the current number. Ending up with missing numbers in the database.
Just make the trigger take care of incrementing the number.
Also, in this way, you don't have to worry in complicated queries that consume a lot of memory and process power.
The trigger is:
CREATE OR REPLACE FUNCTION set_user_id()
RETURNS trigger AS $set_user_id$
BEGIN
IF NEW.user_id IS NULL
THEN
NEW.user_id = COALESCE( ( SELECT MAX(user_id) FROM data WHERE project_id = NEW.project_id ), 0 ) + 1;
END IF;
RETURN NEW;
END $set_user_id$ LANGUAGE plpgsql;
CREATE TRIGGER table_user_id
BEFORE INSERT ON data
FOR EACH ROW EXECUTE PROCEDURE set_user_id();
NOTES:
INSERT INTO data (project_id) VALUES (1);
or
INSERT INTO data (user_id,project_id) VALUES (NULL,1);
With your example data, this would insert
id user_id project_id
-------------------------
8 103 1
COALESCE
)INSERT INTO data (project_id) VALUES (4);
With your example data, this would insert
id user_id project_id
-------------------------
9 1 4
INSERT INTO data (user_id,project_id) VALUES (10,5);
With your example data, this would insert
id user_id project_id
-------------------------
10 10 5
Upvotes: 0
Reputation: 2715
you need to use WITH Clause.
Here is the implementation.
--PostgreSQL 9.6
create table tab
(
id SERIAL ,
user_id integer ,
project_id integer
);
INSERT INTO tab(user_id, project_id ) VALUES (100 , 1);
INSERT INTO tab(user_id, project_id ) VALUES (101 , 1);
INSERT INTO tab(user_id, project_id ) VALUES (1000 , 2);
INSERT INTO tab(user_id, project_id ) VALUES (1001 , 2);
INSERT INTO tab(user_id, project_id ) VALUES (17 , 3);
INSERT INTO tab(user_id, project_id ) VALUES (18 , 3);
INSERT INTO tab(user_id, project_id ) VALUES (102 , 1);
create table src
(
project_id integer
);
insert into src values (1);
insert into src values (2);
insert into src values (3) ;
select * from src ;
select * from tab ;
with cur as
(
select project_id , max(user_id) as max_user_id from tab group by project_id
)
INSERT INTO tab(user_id, project_id )
SELECT cur.max_user_id + row_number() over( partition by src.project_id ) , src.project_id
from src inner join cur on src.project_id = cur.project_id ;
select * from tab order by project_id , user_id ;
Result :
project_id
1 1
2 2
3 3
id user_id project_id
1 1 100 1
2 2 101 1
3 3 1000 2
4 4 1001 2
5 5 17 3
6 6 18 3
7 7 102 1
id user_id project_id
1 1 100 1
2 2 101 1
3 7 102 1
4 8 103 1
5 3 1000 2
6 4 1001 2
7 9 1002 2
8 5 17 3
9 6 18 3
10 10 19 3
https://rextester.com/HREM53701
Read more about with clause here
https://www.tutorialspoint.com/postgresql/postgresql_with_clause.htm
Upvotes: 2
Reputation: 7
for auto-incrementing your id you can do 3 methods,
use an identity eg - when creating a table
create table a( key int identity(1,1))
-- first "1" is initial value
-- second"1" is a value which is added to next one
create a sequence
create sequence seq_name
as dat_type -- bigint
start with 1--one
increment by 1 - incrementing value
ref - https://www.techonthenet.com/sql_server/sequences.php
3 - use select sum(col_name) from t_name ##### from programming code and add one value to retrieved value and use that value for id which is going to add in newly created id..
Upvotes: -1
Reputation: 5932
You can do this by finding out the max value of user_id by project_id and then increment it by 1. If you are having a multi user scenario then need to consider some kind of serialization to ensure that concurrent users do not make use of the same number Eg: Assuming you are going to pass the project_id as a variable @project_id
insert
into user_table
(user_id
,project_id
)
select
(select max(user_id)+1
from user_table
where project_id=@project_id) as user_id
,@project_id
Upvotes: 0