Saurabh
Saurabh

Reputation: 295

Is Postgresql transaction "COMMIT" thread safe?

I understand that PostgreSQL uses MVCC to handle parallel transactions but I am not sure what will happen in the following scenario:

Client1                      Client2                        Time
-------------------------------------------------------------------
BEGIN                         BEGIN                        time=t0  
Update row where id=1    Update row where id=1             time=t1  
COMMIT                        COMMIT                       time=t2  

Both the clients will start the transaction at time=t0, update the same row with id=1 in their respective snapshots at time=t1 and will commit the transactions at time=t2. As the commits are done at the same moment (time=t2), will they be executed in parallel or they will be made sequential by the PostgreSQL? If they run in parallel, is there a chance of data corruption? Thanks.

Edit: "COMMIT" is used with libPQ. In psql, COMMIT = END.

Upvotes: 0

Views: 4906

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324501

This post is full of misapprehensions.

There's no threading involved here, so it's not "thread safe" at all. PostgreSQL uses multiple processes, one per connection. Client applications may not share a single connection between threads unless they ensure they do appropriate locking to stop more than one thread using it at a time.

What you're asking is about concurrency, but not threading. And you're not really asking about "thread safety" but about atomicity and isolation. See the PostgreSQL documentation chapter on concurrency control.

In this case:

  • Snapshots are taken at the first statement that accesses the db, so they're taken at time t1 (update), not t2 (commit).
  • In the default isolation level READ COMMITTED each subsequent statement gets its own snapshot
  • In this case both UPDATEs affect the same row. They will lock the row, and one will wait until the other commits or rolls back. At that time, the waiting one will do a re-check to see if its WHERE clause still matches, then it will proceed with its own changes.
  • The explicit COMMIT and BEGIN here are noise, they don't matter because if you run the statements in isolation they get automatically wrapped in single statement transactions.

So

As the commits are done at the same moment (time=t2)

They cannot be done at exactly the same time, since they take brief locks to update some shared memory data structures and write to the transaction logs. But they do overlap. Logically, one still commits before the other.

You seem to think that statements only take effect when you commit though. That's not true. Statements are run immediately, so those UPDATEs run as soon as you send them. The changes just don't become visible to other transactions until committed.

If they run in parallel, is there a chance of data corruption?

No, that's the whole point of PostgreSQL and other RDBMSes. There won't be data corruption.

In this case the two UPDATEs will sort it out between themselves by using row locking. If you try exactly what you're doing here, in two concurrent sessions, but do not commit them, then query pg_locks you will see that one is waiting on the other.

What there can be is results different to what would happen if you ran the two transactions serially instead of concurrently. This is discussed, in detail and with examples, in the transaction isolation chapters of the PostgreSQL documentation.

Transactions are not magic secret sauce that let you ignore concurrency issues. If you use SERIALIZABLE transactions they're not too far from it, but your app has to be prepared to retry any transaction since they can fail at any time due to serialisation failures. (Well written apps can do this anyway, though, since even READ COMMITTED xacts can fail due to deadlocks etc).

Upvotes: 5

Related Questions