Postgres deadlock with (select for share + insert) and (select for update + update)

Question

I have the following table (all examples are with psycopg2 python library):

CREATE TABLE IF NOT EXISTS TeamTasks
        (
            id              SERIAL PRIMARY KEY,
            round           INTEGER,
            task_id         INTEGER,
            team_id         INTEGER,
            score           FLOAT         DEFAULT 0,
            UNIQUE (round, task_id, team_id)
        )

And I have two functions:

Take TeamTasks from the last rounds and copy them to the new round, without updates between the SELECT and INSERT, and that function is implemented as following:

query = """
    WITH prev_table AS (
        SELECT score FROM teamtasks 
        WHERE task_id = %(task_id)s AND team_id = %(team_id)s AND round <= %(round)s - 1
        ORDER BY round DESC LIMIT 1 
        FOR SHARE
    )
    INSERT INTO TeamTasks (task_id, team_id, round, score) 
    SELECT %(task_id)s, %(team_id)s, %(round)s, score
    FROM prev_table;
"""

with aux.db_cursor() as (conn, curs):    
    for team_id in range(team_count):
        for task_id in range(task_count): 
            curs.execute(
                query, 
                {
                    'task_id': task_id,
                    'team_id': team_id,
                    'round': cur_round + 1,
                },
            )
    conn.commit()

aux.db_cursor is just a convenient wrapper to get psycopg2 connection and cursor.

Update rows in TeamTasks for the specific pair of teams and the specific task and multiple rounds. It's implemented like this:

# I have team1_id, team2_id and task_id

query1 = "SELECT score from teamtasks WHERE team_id=%s AND task_id=%s AND round=%s FOR NO KEY UPDATE"

query2 = "UPDATE teamtasks SET score = %s WHERE team_id=%s AND task_id=%s AND round >= %s"

with aux.db_cursor() as (conn, curs):
    curs.execute(query1, (team1_id, task_id, cur_round))
    score1, = curs.fetchone()

    curs.execute(query1, (team2_id, task_id, cur_round))
    score2, = curs.fetchone()

    sleep(0.1)  # Here happens something time-consuming

    curs.execute(query2, (score1 + 0.1, team1_id, task_id, cur_round))
    curs.execute(query2, (score2 - 0.1, team2_id, task_id, cur_round))

    conn.commit()

I can guarantee that each team can be a subject of only one update in the second function, therefore all concurrently updating teams are always distinct.

Moreover, first function is run considerably rarely and no-one else updates this table except these two functions, so the lock in the first function is precisely so the the table isn't be changed during TeamTasks copy.

With the described above environment, I encounter a lot of deadlocks like the following:

postgres_1  | 2019-11-17 20:43:08.510 UTC [49] ERROR:  deadlock detected
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] DETAIL:  Process 49 waits for ShareLock on transaction 685; blocked by process 65.
postgres_1  |   Process 65 waits for ShareLock on transaction 658; blocked by process 49.
postgres_1  |   Process 49:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;
postgres_1  |
postgres_1  |   Process 65: SELECT score from teamtasks WHERE team_id=0 AND task_id=8 AND round=0 FOR NO KEY UPDATE
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] HINT:  See server log for query details.
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] CONTEXT:  while locking tuple (0,69) in relation "teamtasks"
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] STATEMENT:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;

How to fix these deadlocks? Is there a neat solution I'm not seeing?

richyen · Accepted Answer

select for share seems unnecessary here. That syntax is for preserving referential integrity. In your case, you are selecting from and inserting into the same teamtasks table, so you are unnecessarily holding locks on a table that cause your two connections to block each other (and it would ultimately be nice to refactor the code so you only use one connection, but I don't know how feasible that is for you). As far as I know, the select for share syntax has more to do with updates and referential integrity to other tables than it has to do with inserts to the same table.

The problem lies in that with the first aux_db_cursor() call, you are taking FOR SHARE locks to several rows in teamtasks as you loop through range(team_count) and range(task_count) -- and then in the second aux_db_cursor() call you are doing a time-consuming task before doing an UPDATE on some rows--those UPDATE lock requests are going to collide with those FOR SHARE locks. I'd get rid of the FOR SHARE locks, unless you really need them (at which point, I would look for ways to consolidate it all into one DB connection if possible).

Disclosure: I work for EnterpriseDB (EDB)

Postgres deadlock with (select for share + insert) and (select for update + update)

Answers (1)

Related Questions