Reputation: 295
I understand that PostgreSQL uses MVCC to handle parallel transactions but I am not sure what will happen in the following scenario:
Client1 Client2 Time
-------------------------------------------------------------------
BEGIN BEGIN time=t0
Update row where id=1 Update row where id=1 time=t1
COMMIT COMMIT time=t2
Both the clients will start the transaction at time=t0, update the same row with id=1 in their respective snapshots at time=t1 and will commit the transactions at time=t2. As the commits are done at the same moment (time=t2), will they be executed in parallel or they will be made sequential by the PostgreSQL? If they run in parallel, is there a chance of data corruption? Thanks.
Edit: "COMMIT" is used with libPQ. In psql, COMMIT = END.
Upvotes: 0
Views: 4906
Reputation: 324501
This post is full of misapprehensions.
There's no threading involved here, so it's not "thread safe" at all. PostgreSQL uses multiple processes, one per connection. Client applications may not share a single connection between threads unless they ensure they do appropriate locking to stop more than one thread using it at a time.
What you're asking is about concurrency, but not threading. And you're not really asking about "thread safety" but about atomicity and isolation. See the PostgreSQL documentation chapter on concurrency control.
In this case:
READ COMMITTED
each subsequent statement gets its own snapshotUPDATE
s affect the same row. They will lock the row, and one will wait until the other commits or rolls back. At that time, the waiting one will do a re-check to see if its WHERE
clause still matches, then it will proceed with its own changes.COMMIT
and BEGIN
here are noise, they don't matter because if you run the statements in isolation they get automatically wrapped in single statement transactions.So
As the commits are done at the same moment (time=t2)
They cannot be done at exactly the same time, since they take brief locks to update some shared memory data structures and write to the transaction logs. But they do overlap. Logically, one still commits before the other.
You seem to think that statements only take effect when you commit though. That's not true. Statements are run immediately, so those UPDATEs run as soon as you send them. The changes just don't become visible to other transactions until committed.
If they run in parallel, is there a chance of data corruption?
No, that's the whole point of PostgreSQL and other RDBMSes. There won't be data corruption.
In this case the two UPDATEs will sort it out between themselves by using row locking. If you try exactly what you're doing here, in two concurrent sessions, but do not commit them, then query pg_locks
you will see that one is waiting on the other.
What there can be is results different to what would happen if you ran the two transactions serially instead of concurrently. This is discussed, in detail and with examples, in the transaction isolation chapters of the PostgreSQL documentation.
Transactions are not magic secret sauce that let you ignore concurrency issues. If you use SERIALIZABLE
transactions they're not too far from it, but your app has to be prepared to retry any transaction since they can fail at any time due to serialisation failures. (Well written apps can do this anyway, though, since even READ COMMITTED
xacts can fail due to deadlocks etc).
Upvotes: 5