McAbra
McAbra

Reputation: 2514

Django MySQL REPEATABLE READ "data loss"

I'm looking for information about what is behind the entry in Django 2.0 release notes. Stating:

MySQL’s default isolation level, repeatable read, may cause data loss in typical Django usage. To prevent that and for consistency with other databases, the default isolation level is now read committed. You can use the DATABASES setting to use a different isolation level, if needed.

As I understand repeatable read is "stricter" than read commited, so what is Django doing to produce "data loss" is a question bugging me for sometime now.

Is it stuff like prefetch_related? Or maybe in general making an UPDATE based on potentially stale (SELECTED earlier in a thread) is or can be considered data loos? Or even better - maybe there's something that only MySQL does or has a bug that makes it dangerous on repeatable read?

Thank you.

Upvotes: 0

Views: 379

Answers (1)

Kevin Christopher Henry
Kevin Christopher Henry

Reputation: 48952

There are actually two issues with REPEATABLE READ worth noting here.

One is that Django's internal code is written with the expectation that transactions obey the READ COMMITTED semantics. The fact that REPEATABLE READ is actually a higher isolation level doesn't matter; the point is that it violates the expectations of Django's code, leading to incorrect behavior. (In fact, adding a setting to change the isolation level was initially resisted because "it would imply that Django works correctly under any isolation level, and I don't think that's true".)

A straightforward example (first noted in the issue tracker 9 years ago) is the behavior of get_or_create(). It works by first trying to read the row; then, if that fails, trying to create it. If that creation operation fails it's presumably because some other transaction has created the row in the meantime. Therefore, it tries again to read and return it. That works as expected in READ COMMITTED, but in REPEATABLE READ that final read won't find anything because the read must return the same results (none) that it found the first time.

That's not data loss, though. The second issue is specific to MySQL and the non-standard way it defines the behavior of REPEATABLE READ. Roughly speaking, reads will behave as in REPEATABLE READ but writes will behave as in READ COMMITTED, and that combination can lead to data loss. This is best demonstrated with an example, so let me quote this one, provided by core contributor Shai Berger:

(1) BEGIN TRANSACTION

(2) SELECT ... FROM some_table WHERE some_field=some_value  
     (1 row returned)

(3) (some other transactions commit)

(4) SELECT ... FROM some_table WHERE some_field=some_value
     (1 row returned, same as above)

(5) DELETE some_table WHERE some_field=some_value
     (answer: 1 row deleted)

(6) SELECT ... FROM some_table WHERE some_field=some_value
     (1 row returned, same as above)

(7) COMMIT
     (the row that was returned earlier is no longer in the database)

Take a minute to read this. Up to step (5), everything is as you would expect; you should find steps (6) and (7) quite surprising.

This happens because the other transactions in (3) deleted the row that is returned in (2), (4) & (6), and inserted another one where some_field=some_value; that other row is the row that was deleted in (5). The row that this transaction selects was not seen by the DELETE, and hence not changed by it, and hence continues to be visible by the SELECTs in our transaction. But when we commit, the row (which has been deleted) no longer exists.

The tracker issue that led to this change in the default isolation level gives additional detail on the problem, and it links to other discussions and issues if you want to read more.

Upvotes: 1

Related Questions