Postgres - No actual UPDATE, has it side-effects?

Question

I have a table which holds about 50M rows. I want to execute a simple UPDATE query:

UPDATE the_table SET flag = true;

For 99% of these rows, the flag is already set to true. So only 1% of the rows must be changed.

My question is: is Postgres smart enough to know this? Or will Postgres change these 99% rows anyway, which will result in the typical processes such as WAL, auto-vacuum, re-indexing, synchronization to slaves, ... for the whole table, instead of just for these 1% rows.

In other words, is the following query a safer approach?

UPDATE the_table SET flag = true WHERE flag = false;

joanolo · Accepted Answer

Postgres won't distinguish between the case where flag=true or flag=false (or flag is null, for that matter) when performing the update. However, the results of the two statements are not completely equivalent; or, at least, they're not in general.

There are two major side effects:

The first query (no where clause) will scan and process all the rows in the table. In the second case, and having appropriate indices, it might only go and process a few rows. The end result in terms of "what's on the table" (barring the effects of triggers) will be the same. The time taken to achieve this end result might be substantially different.
If the table (or view) has a trigger that fires "ON UPDATE" and "FOR EVERY ROW" (see CREATE TRIGGER), the trigger function will be called for every row on the table with your first query, and only for the rows WHERE the where condition is true on the second. Again two differences: (1) time and (2) the action of the trigger. If the trigger would, for instance, update a "lastmod" column, it would update it for every row in the first case [which is probably not what you want].

... and a possible third one:

In a sufficiently concurrent scenario: the first query will block all the rows in the table; or the statement might be blocked from updating some rows because another transaction is simultaneously them. So, either long wait times or deadlock. The higher the number of rows updated, the higher the likelihood of having some effects on contention... (or deadlocks, etc.)

The most usual way of updating is using a query with a WHERE clause. Some very special cases might recommend the first one (for instance, you actually want a "lastmod" column updated, even if the rest of the values in the row were not).

"By default" go for the query with a WHERE. Some databases (i.e.: MySQL and safe update) might even not allow you to execute an UPDATE (or DELETE) without a WHERE clause.

Postgres - No actual UPDATE, has it side-effects?

Answers (2)

Related Questions