insumity
insumity

Reputation: 5459

How to make PostgreSQL functions atomic?

Assume I have some PostgreSQL functions like the following:

CREATE FUNCTION insertSth() RETURNS void AS $$
BEGIN
    INSERT INTO ...;
END;

CREATE FUNCTION removeSthAfterSelect() RETURNS TABLE(...) AS $$
BEGIN
     SELECT id INTO some_id ...;
     RETURN QUERY SELECT * FROM ...;
     DELETE FROM ... WHERE id = some_id;
END;

CREATE FUNCTION justDeleteSth() RETURNS void AS $$
BEGIN
     DELETE FROM ...;
END;

CREATE FUNCTION justSelectSth() RETURNS TABLE(...) AS $$
BEGIN
     RETURN SELECT * FROM ...;
END;

From my understanding PostgresSQL functions insertSth, justDeleteSth and justSelectSth are going to be executed atomically(?). So parallel executions of them won't mess anything up.

But for removeSthAfterSelect if there is a parallel execution it could be that SELECT id INTO some_id .. finds something, then concurrently another transaction calls justDeleteSth and deletes the row with id = someId, so when the transaction continues it won't delete anything here: DELETE FROM ... WHERE id = some_id; meaning it messes things up.

Is this the case? Is there a way to avoid this problem? E.g. by saying that removeSthAfterSelect should be executed atomically?

Upvotes: 4

Views: 13492

Answers (2)

Bob
Bob

Reputation: 6173

It is often possible to achieve the desired "atomic" behaviour using locking

e.g.:

BEGIN;  -- transaction
SELECT pg_advisory_xact_lock(123);  -- 123 is any big integer

-- do your "atomic" stuff here, other transactions
-- trying to acquire the same (123) lock will be waiting for it to be released

COMMIT;  -- transaction has ended, the locks are released automatically

the drawback is that such locked blocks won't be executed in parallel. See the docs https://www.postgresql.org/docs/11/explicit-locking.html for details.

Upvotes: 2

Craig Ringer
Craig Ringer

Reputation: 324275

A transaction has the property of atomic commit, i.e. the entire transaction is guaranteed to take effect, or none of it does.

That doesn't mean that transactions can't interact. In particular, in READ COMMITTED mode a transaction committing midway though another transaction can have visible effects. Even without that, concurrently anomalies are possible and normal. See the PostgreSQL chapter on concurrency control, particularly the transaction isolation section. Statements in functions are no more immune to concurrency issues than standalone statements.

Even within a single statement it's possible to have concurrency issues. Statements are not magically atomic. People often think that if they can pack everything into a single query using CTEs, subqueries, etc, it'll be magically immune to concurrency issues. That is not the case.

There's no function label to say "execute this atomically" because the concept you're looking for just doesn't exist in the DBMS. The closest you'll get is to LOCK TABLE ... IN ACCESS EXCLUSIVE all tables that the function uses, so that nothing else can touch them. That is usually rather excessive and unnecessary if you can reason effectively about concurrency and transaction isolation.

It's difficult to be more specific because you're using a very generalised example with all the details left out. For example, why does it matter if you attempt to delete the row twice?

A few concepts you should study:

  • Snapshots
  • READ COMMITTED vs SERIALIZABLE transaction isolation
  • Row and table level locks, both implicit (e.g. those taken by DML) and explicit (e.g. SELECT ... FOR UPDATE)
  • Transaction visibility
  • Predicate re-checks after a DML statement finishes waiting on a lock

As one example of concurrency in action, take a look at the upsert problem.


But for removeSthAfterSelect if there is a parallel execution it could be that SELECT id INTO some_id .. finds something, then concurrently another transaction calls justDeleteSth and deletes the row with id = someId, so when the transaction continues it won't delete anything here: DELETE FROM ... WHERE id = some_id; meaning it messes things up.

You're talking as if one transaction stops and the other runs, then the first continues. That's often not the case; things can run completely concurrently, with many statements happening truly simultaneously.

The main thing that limits that is row level locking. In this case, there's a race condition, as both DELETEs try to acquire the row update lock for the row. Whichever gets it will continue and delete the row. The other DELETE gets stuck on the row lock until the winning transaction commits or rolls back. If it rolls back, it's as if nothing happened and the waiting transaction continues as normal. If the winning transaction commits the delete, the waiting transaction sees the lock has been released, and (in READ COMMITTED mode) re-checks the WHERE clause predicate to make sure the row is still matched, discovers it doesn't exist anymore, and carries on without an error as it's not an error to delete zero rows.

In PL/PgSQL you can check the affected row count if you want to enforce that a statement affect exactly one row, and RAISE EXCEPTION if it didn't match the expected affected rows. There's also INTO STRICT for SELECT.

Upvotes: 17

Related Questions