codependent
codependent

Reputation: 24452

Why isn't REPETEABLE_READ on MariaDB producing phantom reads?

In my tests I have seen that when using MariaDB, executing the same query in REPETEABLE_READ isolation doesn't produce phantom reads, when it should.

For instance:

I have two rows in the bank_account table:

  ID |  OWNER | MONEY
------------------------
  1  |  John  | 1000
  2  |  Louis | 2000

The expected flow should be as shown below:

THREAD 1 (REPETEABLE_READ)                THREAD 2 (READ_UNCOMMITED)
  |                                         |
findAll()->[1|John|1000,2|Louis|2000]       |          
  |                                         |
  |                                       updateAccount(1, +100)
  |                                       createAccount("Charles", 3000)                 
  |                                       flush()
  |                                         |
  |                                         commitTx()
  |                                         |_
  |                                         
findAll()->[1|John|1000,2|Louis|2000,       
  |         3|Charles|3000]                 
  |                                         
  |                                         
 commitTx()                               
  |_                                        

To sum up, after Thread2.createAccount("Charles", 3000); and its flush, Thread1 would search all rows and would get

  ID |  OWNER   | MONEY
------------------------
  1  |  John    | 1000
  2  |  Louis   | 2000
  3  |  Charles | 3000

Thread1 is protected from uncommited changes seeing [1, John, 1000] instead of [1, John, 1100] but it is supposed to see the new inserted row.

However, what Thread1 retrieves in the second findAll are the exact same results as the ones from the first findAll():

  ID |  OWNER   | MONEY
------------------------
  1  |  John    | 1000
  3  |  Charles | 3000

It doesn't have phantom reads. Why?????

This is the code executed by Thread1:

@Transactional(readOnly=true, isolation=Isolation.REPEATABLE_READ)
@Override
public Iterable<BankAccount> findAllTwiceRepeteableRead(){
    printIsolationLevel();
    Iterable<BankAccount> accounts = baDao.findAll();
    logger.info("findAllTwiceRepeteableRead() 1 -> {}", accounts);
    //PAUSE HERE
    ...
}

I pause the execution where it sais //PAUSE HERE.

Then Thread2 executes:

bankAccountService.addMoneyReadUncommited(ba.getId(), 200);
bankAccountService.createAccount("Carlos", 3000);

And then Thread1 resumes:

//PAUSE HERE
...
Iterable<BankAccount> accounts = baDao.findAll();
logger.info("findAllTwiceRepeteableRead() 2 -> {}", accounts);

UPDATE: I've updated the thread transaction flows with what I'm really doing (I am commiting the second transaction after the new row insert).

This matches what, according to wikipedia is a phantom read and I think is the very same scenario. So I still don't get why I'm not getting the phantom read [3|Charles,3000]

A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.

This can occur when range locks are not acquired on performing a SELECT ... WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfill that WHERE clause.

Transaction 1                             Transaction 2
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
                                          /* Query 2 */
                                          INSERT INTO users(id,name,age) VALUES ( 3, 'Bob', 27 );
                                          COMMIT;
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
COMMIT;

Upvotes: 5

Views: 560

Answers (1)

Shadow
Shadow

Reputation: 34231

What you described as the actual behaviour is in fact the correct behaviour for repeatable_read. The behaviour you are expecting can be achieved by using read_committed.

As mariadb documentation on repeatable_read says (bolding is mine):

there is an important difference from the READ COMMITTED isolation level: All consistent reads within the same transaction read the snapshot established by the first read.

In thread 1 the 1st FindAll() call returning John and Louis established the snapshot. The 2nd FindAll() simply used the same snapshot.

This is further corroborated by a Percona blog post on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels:

In REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC, sees < ABB ) is created at the start of the transaction, and this read view (consistent snapshot in Oracle terms) is held open for the duration of the transaction. If you execute a SELECT statement at 5AM, and come back in an open transaction at 5PM, when you run the same SELECT, then you will see the exact same resultset that you saw at 5AM. This is called MVCC (multiple version concurrency control) and it is accomplished using row versioning and UNDO information.

UPDATE

Caveat: The following references are from the MySQL documentation. However, since these references relate to the innodb storage engine, I firmly believe that they apply to mariadb's innodb storage engine as well.

So, in innodb storage engine under repeatable read isolation level, the non-locking selects within the same transaction read from the snapshot established by the first read. No matter how many records were inserted / updated / deleted in concurrent committed transactions, the reads will be consistent. Period.

This is the scenario described by the OP in the question. This would imply that a non-locking read in repeatable read isolation level would not be able to produce a phantom read, right? Well, not exactly.

As MySQL documentation on InnoDB Consistent Nonlocking Reads says:

The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them. If a transaction does update or delete rows committed by a different transaction, those changes do become visible to the current transaction. For example, you might encounter a situation like the following:

SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz';
-- Returns 0: no rows match. DELETE FROM t1 WHERE c1 = 'xyz';
-- Deletes several rows recently committed by other transaction.

SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc';
-- Returns 0: no rows match. UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc';
-- Affects 10 rows: another txn just committed 10 rows with 'abc' values. 
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba';
-- Returns 10: this txn can now see the rows it just updated.

To sum up: if you use innodb with repeatable read isolation mode, then phantom reads may occur if data modification statements in concurrent committed transactions interact with data modification statements within the current transaction.

The linked Wikipedia article on isolation levels describes a general theoretical model. You always need to read the actual product manual how a certain feature is implemented because there may be differences.

In the Wikipedia article only locks are described as a mean of preventing the phantom reads. However, innodb uses the creation of the snapshot to prevent the phantom reads in most of the cases, thus there is no need to rely on locks.

Upvotes: 3

Related Questions