Reda
Reda

Reputation: 317

Lock specific table rows to insert a new row

I have an Operations table with the columns sourceId, destinationId, amount, status. Whenever a user makes a transfer the API inserts a new row into that table, after checking the user's balance by calculating the sum of credit operations minus the sum of debit operations. Only when the balance is greater or equal than the transfer amount the operation is inserted with a successful status.

The issue is concurrency since a user performing multiple transfers at the same time might end up with a negative balance.

There are multiple ways of handling this concurrency issue with PostgreSQL:

  1. Serializable transaction isolation level
  2. Table locking
  3. Row versioning
  4. Row locking
    etc.

Our expected behavior is, instead of failing with a unique violation on (sourceId, version), the database should wait for the previous transaction to finish, get the latest inserted version without setting the transaction isolation level to SERIALIZABLE.

However, I am not completely sure about the best approach. Here's what I tried:

1. Serializable transaction isolation level

This is the easiest approach, but the problem is lock escalation because if the database engine is under heavy load, 1 transaction can lock the whole table up, which is the documented behavior.

Pseudo-code:

newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2) and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL' where id = newId
COMMIT;

2. Table locking

This is what we want to avoid by NOT using serializable transaction level

3. Row versioning

This approach seems best so far performance-wise. We added a column version int and a unique index on (sourceId, version), and when the transaction is inserted it is inserted with the next version. If two transactions are concurrent the database throws an error:

duplicate key value violates unique constraint "IX_Transactions_SourceWalletId_Version"

Pseudo-code:

newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN;
lastVersion = SELECT o."Version"
     FROM "Operations"
     WHERE ("SourceId" = 2) AND ("Version" IS NOT NULL)
     ORDER BY o."Version" DESC
     LIMIT 1
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2) 
   and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL', "Version" = lastVersion + 1 where id = newId;
COMMIT;

4. Row locking

Before calculating the user balance, lock all transaction rows with sourceWalletId = x (where x is the user making the transfer). But I can't find a way of doing this in PostgreSQL, using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.

Upvotes: 4

Views: 2109

Answers (3)

Emmef
Emmef

Reputation: 522

Number one in your example, the serializable variant, is the only one that will guarantee correct behaviour without the code having to retry transactions if the update count is zero or the transaction was rolled back. It is also the simplest to reason about. By the way, REPEATABLE READ would also be good enough for already existing rows.

Number 3 in your example, which looks lightly like optimistic locking, might look more performant, but that depends on the type of load. Updating an index, in your case the unique index, can also be a performance hit. And generally, you have less control over locks on indexes, which makes the situation less deterministic or more difficult to reason about. Also, it is still possible to suffer from different read values in any transaction isolation level below REPEATABLE READ.

Another thing is that your implementation behaves incorrectly in the following scenario:

  • Process 1 starts and reads version 1
  • Process 2 starts and reads version 1
  • Process 2 succeeds and writes version 2
  • Process 3 starts and reads version 2
  • Process 3 succeeds and writes version 3
  • Process 1 succeeds and writes version 2 // NO VIOLATION!

What does work is optimistic locking, which looks somewhat like your number 3. Pseudo code / SQL:

BEGIN 
  SELECT "version", "amount" FROM "operations" WHERE "id" = identifier
  // set variable oldVersion to version
  // set variable newVersion to version + 1
  // set variable newAmount to amount + amountOfOperation
  IF newAmount < 0
    ROLLBACK
  ELSE
    UPDATE "operations" SET "version" = newVersion, "amount" = newAmount WHERE "id" = identifier AND "version" = oldVersion
   COMMIT
 ENDIF

This does not require a unique index containing version. And in general, the query in the WHERE condition of the UPDATE and the update itself are correct even with READ COMMITTED transaction isolation level. I am not certain about PostGreSQL: do verify this in documentation!

In general, I would start with the serializable number 1 example, until measurements in real use-cases show that it is a problem. Then try optimistic locking and see if it improves, also with actual use-cases.

Do remember that the code must always be able to replay the transaction from BEGIN to END if the UPDATE reports zero updated rows, or if the transaction fails.

Good luck and have fun!

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656231

using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.

Kind of true, but also not a show-stopper. Yes, in default READ COMMITTED transaction isolation each statement only sees rows that were committed before the query began. The query, mind you, not the transaction. See:

Just start the next query in the same transaction after acquiring the lock.

Assuming a table holding exactly one row per (relevant) user (like you should have). I'll call it "your_wallet_table", based on the cited "sourceWalletId":

BEGIN;
SELECT FROM "your_wallet_table" WHERE "sourceWalletId" = x FOR UPDATE;
-- x is the user making the transfer

-- check the user's balance (separate query!)

INSERT INTO "Operations" ...  -- 'SUCCESSFUL' or 'PENDING'

COMMIT;

The lock is only acquired once no other transaction is working on the same user, and only released at the end of the transaction.

The next transaction will see all committed rows in its next statement. If all transactions stick to this modus operandi, all is fine. Of course, transactions cannot be allowed to change rows affecting the balance of other users.

Related:

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 246023

You will have to take a performance hit. SERIALIZABLE isolation would be the easiest way. You can increase max_predicate_locks_per_relation, max_predicate_locks_per_xact or max_predicate_locks_per_page (depending on which limit you hit) to escalate locks later.

Alternatively, you could have a table that stores the balance per user, which is updated by a deferred trigger. Then you can have a check constraint on that table. This will serialize operations only per user.

Upvotes: 0

Related Questions