MjeOsX
MjeOsX

Reputation: 405

'Lost update' vs 'Write skew'

Can somebody explain me whats exactly the difference between a 'lost update' and a 'write skew' in database transaction theory? Can somebody give me an example?

Upvotes: 17

Views: 4640

Answers (2)

  • Lost update is that two transactions read the same row to update it but the first committed update is overwritten by the second committed update.

  • Write skew is that two transactions read the same data to insert, update or delete rows but the committed data is inconsistent.

First, this is the example of lost update below. There is product table with id, name and stock as shown below.

product table:

id name stock
1 Apple 10
2 Orange 20

These steps below shows lost update. *The stocks of the products decrease when customers buy the products:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT stock FROM product WHERE id = 2;

20
T1 reads 20 which is updated later to 13 because a customer buys 7 oranges.
Step 4 SELECT stock FROM product WHERE id = 2;

20
T2 reads 20 which is updated later to 16 because a customer buys 4 oranges.
Step 5 UPDATE product SET stock = 13 WHERE id = 2; T1 updates 20 to 13.
Step 6 COMMIT; T1 commits.
Step 7 UPDATE product SET stock = 16 WHERE id = 2; T2 updates 13 to 16 after T1 commits.
Step 8 COMMIT; T2 commits.

*Lost update occurs.

Second, this is the example of write skew. There is doctor table with id, name and on_call as shown below.

doctor table:

id name on_call
1 John True
2 Lisa True

These steps below shows write skew. *At least one doctor must be on call but finally, no doctor is on call:

Flow John's Transaction (JT) Lisa's Transaction (LT) Explanation
Step 1 BEGIN; JT starts.
Step 2 BEGIN; LT starts.
Step 3 SELECT count(*) FROM doctor WHERE on_call = True;

2
JT reads 2 so John can take a rest.
Step 4 SELECT count(*) FROM doctor WHERE on_call = True;

2
LT reads 2 so Lisa can take a rest.
Step 5 UPDATE doctor SET on_call = False WHERE id = 1; JT updates True to False which means John takes a rest.
Step 6 COMMIT; JT commits.
Step 7 UPDATE doctor SET on_call = False WHERE id = 2; LT updates True to False which means Lisa takes a rest.
Step 8 COMMIT; LT commits.

John and Lisa both take a rest.

*Write skew occurs.

Next, this is also the example of write skew. There is event table with id, name and user as shown below.

event table:

id name user
1 Make Sushi John
2 Make Sushi Tom

These steps below shows write skew. *Only 3 users can join the event "Make Sushi" but finally, 4 users join it:

Flow Transaction (JT) Transaction (LT) Explanation
Step 1 BEGIN; T1 starts.
Step 2 BEGIN; T2 starts.
Step 3 SELECT count(*) FROM event WHERE name = 'Make Sushi';

2
T1 reads 2 so only one user can join it.
Step 4 SELECT count(*) FROM event WHERE name = 'Make Sushi';

2
T2 reads 2 so only one user can join it.
Step 5 INSERT INTO event values ('Make Sushi', 'Lisa'); T1 inserts Lisa to event table.
Step 6 COMMIT; T1 commits.
Step 7 INSERT INTO event values ('Make Sushi', 'Kai'); T2 inserts Kai to event table.
Step 8 COMMIT; T2 commits.

4 users join it.

*Write skew occurs.

These tables below are my experiment results of lost update and write skew with each isolation level in MySQL and PostgreSQL. *Yes means Occurs, No means Doesn't occur and SFU means SELECT FOR UPDATE.

MySQL:

Isolation Level Lost Update Lost Update with SFU Write Skew Write Skew with SFU
READ UNCOMMITTED Yes No Yes No
READ COMMITTED Yes No Yes No
REPEATABLE READ Yes No Yes No
SERIALIZABLE No No No No

PostgreSQL:

Isolation Level Lost Update Lost Update with SFU Write Skew Write Skew with SFU
READ UNCOMMITTED Yes No Yes No
READ COMMITTED Yes No Yes No
REPEATABLE READ No No Yes No
SERIALIZABLE No No No No

Upvotes: 8

Joel Stevick
Joel Stevick

Reputation: 2038

Informally, lost updates and write skew are ways that concurrent write transactions can interfere with each other.

Write skew happens when an update is made within a transaction based upon stale data. Stale data is a value read by a transaction that has become stale due to a subsequent committed write from a concurrent transaction.

Lost updates happen when a committed value written by one transaction is overwritten by a subsequent committed write from a concurrent transaction. In fact, lost update is really a special case of write skew; where updates are applied to data that has become stale.

Consider the case where a database for a retail store maintains an Inventory table. The database does not implement transaction isolation.

The Inventory table has a "ProductId" column and an "InStock" column that counts the number of items that are currently in stock for a particular product. Each purchase (transaction) decrements the "InStock" value by the number of items purchased.

Imagine that the store has two electric shavers (of a specific model) in stock.

Two customers each purchase one of these shavers, simultaneously.

Each of the concurrent purchases (transactions) reads the same value (two) from the shaver's "InStock" record. The transactions each decrement the "InStock" counter and commit the updated value (one) to the database. After both of the concurrent transactions have committed, the counter will incorrectly indicate that the shaver is still in stock (one item remaining).

One of the updates was lost.

Suppose the database implements snapshot Isolation (with lost update detection), in this case lost updates don't happen. This is because snapshot isolation detects when a lost update has occurred. After a transaction commits data, concurrent transactions that attempt to commit writes for the same data are aborted by the database. In our example, the process for which the transaction is aborted starts a new transaction to re-read the "InStock" column, decrement it, and commit the updated value. Assuming no other conflicts, this attempt to update the record commits successfully and the "Instock" column contains the (correct) value zero.

Transaction isolation is a deep topic.

Furthermore assume that the database records inventory history in an InventoryHistory table. The InventoryHistory table has the columns "Timestamp", "ProductId", and "InStock" (remaining after purchase). By design, the update to the InventoryHistory table is the last operation in a purchase transaction. After the two transactions commit, the respective InventoryHistory records will each reflect an "Instock" value of one -- this is incorrect, since one of the records should reflect an "Instock" value of zero. The incorrect InventoryHistory record is an example of write skew.

In this case snapshot isolation did not prevent anomalous data from being written to the database, since no updates were lost. Rather, the data written was anomalous because a value that was read by the transaction had become stale -- this is write skew. Snapshot isolation does not prevent write skew. To prevent write skew, the database must implement serializable isolation.

Read this article for a rigorous discussion of write skew, serializability and snapshot isolation.

Upvotes: 15

Related Questions