Reputation: 11499

Increment counter based on column match (postgres)

I have a column tracking unique user_id by project (for external use).

I want to increment the user_id column when creating a new user, subject to whatever the count is at for that project. There are many existing records already, but from here on out we want the user_id to increment up by 1 for each new record in a given project_id.

Example User table

id   user_id   project_id
-------------------------
1    100       1
2    101       1
3    1000      2
4    1001      2
5    17        3
6    18        3
7    102       1

New row with project_id = 1 should use user_id = 103
New row with project_id = 2 should use user_id = 1002
New row with project_id = 3 should use user_id = 19

How can I construct the user_id column and/or INSERT query such that it will always increment the user_id based on the largest existing user_id within the corresponding project_id, and guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts?

Upvotes: 2

Answers (5)

Vladimir Baranov

Reputation: 32693

The straightforward way to guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts is to prevent concurrent activity.

One way to achieve it is to set transaction isolation level to Serializable.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- here I took the query from George Joseph's answer

insert into user_table
    (user_id, project_id)
select
    coalesce(max(user_id), 0) + 1 as user_id
    ,@project_id as project_id
from
    user_table
where
    project_id=@project_id

COMMIT TRANSACTION

You can run this query block from several sessions simultaneously and the engine will handle concurrency behind the scenes. I don't really know the details how Postgres does it. Most likely concurrent transactions will wait for the previous to finish.

For this to work efficiently you'll need an index on (project_id, user_id). You also need to make it unique to enforce your constraint. The order of columns in this index is important.

You also mentioned that you expect thousands of projects, and eventually up to millions of users per project. This adds up to a billion rows, which is quite a lot to run MAX against for each insert. Even with appropriate index.

You can create a separate table project_sequences to store the last value of user_id for each project_id. This table will have two columns project_id and last_user_id with primary key on both of them (project_id, last_user_id). The order of columns in index is important.

Now you can query and update the small table project_sequences with only 1000 rows for each insert in the main large table. I'm not familiar with Postgres syntax for variables, so below is pseudo-code.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- read the last_user_id for the given project_id from the small table into a variable
-- and increment it
-- update the small table with the new last_user_id
-- use the freshly generated user_id to insert into the main table


-- or, without variables
-- increment the last_user_id
update project_sequences
set last_user_id = 
    (
    select coalesce(max(last_user_id), 0) + 1 
    from project_sequences
    where project_id=@project_id
    )
where
    project_id=@project_id


-- use the new id to insert into the main table
insert into user_table
    (user_id, project_id)
select
    last_user_id
    ,@project_id as project_id
from
    project_sequences
where
    project_id=@project_id


COMMIT TRANSACTION

With variables it will be a bit easier to handle the case when the given project_id is the new one, that doesn't exist in the table yet and set the new user_id to start with 1, or whatever starting value you need.

Upvotes: 3

Dan

Reputation: 1881

I suggest using a trigger before the insert, that way your 99.99% shure you wont have duplicates and holes in the sequence (101,102,missing,111,112).

The problem with sequences is that if not used carefully, you can lose control of the current number. Ending up with missing numbers in the database.

Just make the trigger take care of incrementing the number.

Also, in this way, you don't have to worry in complicated queries that consume a lot of memory and process power.

The trigger is:

CREATE OR REPLACE FUNCTION set_user_id()
    RETURNS trigger AS $set_user_id$
BEGIN
    IF NEW.user_id IS NULL
    THEN
        NEW.user_id = COALESCE( ( SELECT MAX(user_id) FROM data WHERE project_id = NEW.project_id ), 0 ) + 1;
    END IF;
    RETURN NEW;
END $set_user_id$ LANGUAGE plpgsql;

CREATE TRIGGER table_user_id
    BEFORE INSERT ON data
    FOR EACH ROW EXECUTE PROCEDURE set_user_id();

NOTES:

It only increments the user if the insert sends null to the user_id. For example:

INSERT INTO data (project_id) VALUES (1);

INSERT INTO data (user_id,project_id) VALUES (NULL,1);

With your example data, this would insert

id   user_id   project_id
-------------------------
8    103       1

If there is no previous user_id, it sets the value in 1 (COALESCE)

INSERT INTO data (project_id) VALUES (4);

With your example data, this would insert

id   user_id   project_id
-------------------------
9    1         4

If you want to set a starting user_id, you just have to set it on the first insert of that project_id.

INSERT INTO data (user_id,project_id) VALUES (10,5);

With your example data, this would insert

id   user_id   project_id
-------------------------
10   10        5

Upvotes: 0

Digvijay S

Reputation: 2715

you need to use WITH Clause.

Here is the implementation.

--PostgreSQL 9.6
create table tab
(
    id SERIAL ,
    user_id integer ,
    project_id integer 
 );

INSERT INTO tab(user_id, project_id ) VALUES (100 ,      1);
INSERT INTO tab(user_id, project_id ) VALUES (101     ,   1);
INSERT INTO tab(user_id, project_id ) VALUES (1000    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (1001    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (17      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (18      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (102     ,   1);

create table src 
(
    project_id integer
);

insert into src values  (1); 
insert into src values (2);
insert into src values (3) ; 

select * from src ; 
select * from tab ; 

with cur as 
(
select project_id , max(user_id)  as max_user_id from tab group by project_id
)
INSERT INTO tab(user_id, project_id ) 
    SELECT cur.max_user_id +  row_number() over( partition by src.project_id  ) , src.project_id 
    from src inner join cur on src.project_id = cur.project_id  ;

select * from tab order by project_id , user_id ;

Result :

    project_id
1   1
2   2
3   3

    id  user_id project_id
1   1   100 1
2   2   101 1
3   3   1000    2
4   4   1001    2
5   5   17  3
6   6   18  3
7   7   102 1

    id  user_id project_id
1   1   100 1
2   2   101 1
3   7   102 1
4   8   103 1
5   3   1000    2
6   4   1001    2
7   9   1002    2
8   5   17  3
9   6   18  3
10  10  19  3

https://rextester.com/HREM53701

Increment counter based on column match (postgres)

Example User table

Answers (5)

Related Questions