fishbone
fishbone

Reputation: 3259

Prevent lost updates with high transaction isolation levels: Is this a common misconception?

I noticed that my applications often write values to a database that depend on a former read operation. A common example is a bank account where a user could deposit money:

void deposit(amount) {
    balance = getAccountBalance()
    setAccountBalance(balance + amount)
}

I want to avoid a race condition if this method is called by two threads/clients/ATMs simultaneously like this where the account owner would lose money:

balance = getAccountBalance()       |
                                    | balance = getAccountBalance()
setAccountBalance(balance + amount) |
                                    | // balance2 = getAccountBalance() // theoretical
                                    | setAccountBalance(balance + amount)
                                    V

I often read that Repeatable Read or Serializable can solve this problem. Even the german Wikipedia article for Lost Updates states this. Translated to english:

The isolation level RR (Repeatable Read) is often mentioned as a solution to the lost update problem.

This SO answer suggests Serializable for a similar problem with INSERT after SELECT.

As far as I understood the idea - at the time the process on the right side tries to set the account balance, a (theoretical) reading operation wouldn't return the same balance anymore. Therefore the write operation is not allowed. And yes - if you read this popular SO answer, it actually sounds perfectly fitting:

under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.

But then I wondered what "they cannot be deleted nor changed" actually means. What happens if you try to delete/change it anyway? Will you get an error? Or will your transaction wait until the first transaction finished and in the end also perform its update? This makes all the difference. In the second case you will still lose money.

And if you read the comments below it gets even worse, because there are other ways to meet the Repeatable Read conditions. For example a snapshot technology: A snapshot could be taken before the left side transaction writes its value and this allows to provide the original value if a second read occurs later in the right side transaction. See, for instance, the MySQL manual:

Consistent reads within the same transaction read the snapshot established by the first read

I came to the conclusion that restricting the transaction isolation level is probably the wrong tool to get rid of the race condition. If it solves the problem (for a specific DBMS), it's not due to the definition of Repeatable Read. Rather it's because of a specific implementation to fulfil the Repeatable Read conditions. For instance the usage of locks.

So, to me it looks like this: What you actually need to solve this issue is a locking mechanism. The fact that some DBMS use locks to implement Repeatable Read is exploited.

Is this assumption correct? Or do I have a wrong understanding of transaction isolation levels?


You might be annoyed, because this must be the millionth question about the topic. The problem is: The example bank account scenario is absolutely critical. Just there, where it should be absolutely clear what's going on, it seems to me as if there is so much misleading and contradictory information and misconceptions.

Upvotes: 2

Views: 2561

Answers (3)

Tomek Samcik
Tomek Samcik

Reputation: 546

The problem here is that you are asking what Isolation Level, as defined by the SQL Standard, is needed to sort out a concurrency anomaly that is not part of this definition.

SQL Standard only defines how Isolation Levels (Read Uncommited, Read Commited, Repeatable Read, Serializable) map to Dirty Read, Non-Repeatable Read and Phantom Read anomalies. No mention of Lost-Update, so this - as you rightly pointed out - depends on how isolation levels are implemented by a specific DBMS.

Supposedly REPEATABLE_READ is enough to prevent Lost-Update on PostgreSQL and SERIALIZABLE is needed to prevent it on MySQL and Oracle.

Here are some interesting posts about Oracle and PostgreSQL/MySQL

Upvotes: 5

SQLpro
SQLpro

Reputation: 5177

Lost update is a transactional anomaly that occurs only if the transaction use optimistic locking. It will never happen in pessimistic locking.

  • Some RDBMS offers optimistic locking only, which is the case for Oracle Database and PostGreSQL
  • Some other RDBMS offers only pessimistic locking, which is the case of IBM DB2
  • And finally Microsoft SQL Server is able to alternately use optimistic or pessimistic locking depending on the user's choice, with a default behavior that is pessimistic

So the questions must be facing which RDBMS do you use and which type of locking do you have...

Some more informations...

Guaranteeing the successful completion of a transaction that performs writes is only possible if one starts by locking in exclusive mode, while maintaining the locks for the duration of the transaction by ensuring that the lock mode is pessimistic and not optimistic. Despite this, this technique will not prevent deadlocks...

The mathematician Edsger Dijkstra solved this last problem (Banker's algorithm) by showing that it is necessary, before starting to update the data (INSERT, UPDATE, DELETE...), set all the locks necessary to protect the data handled, which amounts to having only exclusive access to all the processing data... Dijkstra win the Turing award for this contribution to computer science !

In other words, having only one user who accesses the database! ...

To summerize...

Transactional anomalies and what does avoid when using an isolation level is given with the following table:

Isolation level and transactional anomalies

Upvotes: 2

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89406

In SQL Server both REPEATABLE READ and SERIALIZABLE will prevent the lost update by failing one transaction with a deadlock. In these isolation levels each session will take and hold a shared (S) lock on the target row during the initial SELECT. Then each session will try to get an exclusive (X) lock on the row to update it, causing a deadlock.

If you want to avoid the lost update without by having one session wait until the other has completed you must create a more exclusive lock before or during the initial select. The normal pattern for this is to add an UPDLOCK hint to the initial select to indicate a "select for update". And with "select for update" there's no reason to raise the transaction isolation level.

Oracle and PostgreSQL also have "select for update" syntax you can use.

Upvotes: 0

Related Questions