How can this SQL command cause phantom read?

Question

In short, my professor said the following transactions are susceptible to phantom read if they're both left to default isolation level (READ COMMITTED)

BEGIN;
UPDATE product SET price = 100 WHERE type='Electronics';
COMMIT;


BEGIN;
UPDATE product SET price = 100 WHERE price < 100;
COMMIT;

I can't really seem to be able to figure how a phantom read could happen.

He also said that, to fix this, you'd have to set the second transaction to REPEATABLE READ

So... why? How could a phantom read happen here, and why does REPEATABLE READ fixes it?

EDIT: could this be the case?
Say we have an initial product P that has type=Electronics AND price=1049

T1 would begin, and add P to the set of rows to consider.

T2 would begin, and ignore P (its price is below 1050).

T1 would increment its price to 1100, and COMMITs. Now T2 should update its rows set and include P.

But since in READ COMMITTED a transaction will get an updated snapshot only if changes are made to rows that were within the SET they are considering, the change goes unnotified.

T2, therefore, simply ignores P, and COMMITs.

This scenario was suggested by the example I found in postgresql docs, on the isolation level page, read committed paragraph.

Do you think this is a possible scenario and, hopefully, what my professor meant?

Laurenz Albe · Accepted Answer

A phantom read means that if you run the same SELECT twice in a transaction, the second one could get different results than the first.

In the words of the SQL standard:

SQL-transaction T1 reads the set of rows N that satisfy some
.
SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the used by SQL-transaction T1.
If SQL-transaction T1 then repeats the initial read with the same , it obtains a different collection of rows.

This can be caused by concurrent data modifications like the ones you quote at low isolation levels. This is because each query will see all committed data, even if they were committed after the transaction started.

You could also speak of phantom reads in the context of an UPDATE statement, since it also reads from the table. Then the same UPDATE can affect different statements if it is run twice.

However, it makes no sense to speak of phantom reads in the context of the two statements in your question: The second one modifies the column it is searching for, so the second execution will read different rows, no matter if there are concurrent data modifications or not.

Note: The SQL standard does not require that REPEATABLE READ transactions prevent phantom reads — this is only guaranteed with SERIALIZABLE isolation.

In PostgreSQL phantom reads are already impossible at REPEATABLE READ isolation, because it uses snapshots that guarantee a stable view of the database.

How can this SQL command cause phantom read?

Answers (2)

Related Questions