Update from multiple tables?

Question

I'm trying to construct a query in which I update table_a with data contained in both table_b and table_c.

All the UPDATE ... FROM docs and questions I've found seem to be focused on updating table_a with data in table_c by joining through table_b. I don't want that, I need to use data contained in both table_b and table_c, which are both left joined to table_a

An example select would be:

SELECT 
        a.id 
    ,   b.counted as count_b
    ,   c.counted as count_c
FROM 
    table_a a
LEFT JOIN
    table_b b on a.id = b.id
LEFT JOIN
    table_c c on a.id = c.id
;


 id | count_b | count_c
----+---------+---------
 34 |       6 |       2
 43 |      16 |       2
 45 |      33 |       5

Working 'one table'-updates are:

UPDATE table_a SET count_b = table_b.counted
FROM table_b WHERE table_a.id = table_b.id;

UPDATE table_a SET count_c = table_c.counted
FROM table_c WHERE table_a.id = table_c.id;

However I'd like to combine them for efficiency (update is an insert+delete, so there's a bit of savings to be had. But I can't seem to figure out how to join two tables onto table_a in an UPDATE ... FROM.

I could handle this with a subquery - but I'd prefer avoiding that.

Erwin Brandstetter · Accepted Answer

^{To shorten my longish answer I work with table names a, b, c.}
If a.id, b.id and c.id are UNIQUE (but can be missing):

Basic form with a number of caveats (keep reading!):

UPDATE a
SET    count_b = b.counted
     , count_c = c.counted
FROM   b
FULL   JOIN c USING (id)
WHERE  a.id = COALESCE(b.id, c.id)
;

Wouldn't work properly if any of the id columns are not UNIQUE - but neither would separate updates.

If nothing actually changes (all counted columns identical to what's in a already) you would just create a dead row for full cost. If that can be the case and you want to avoid it, append an additional WHERE clause:

AND   (a.count_b <> b.counted OR
       a.count_c <> c.counted)  -- simple form with all columns NOT NULL

FULL [OUTER] JOIN makes it work if some, but not all tables (one in the simple case of two source tables) does not have a row for the same id. It effectively emulates multiple LEFT JOIN you had in mind, with a small but important difference: You get phantom NULL values for columns from tables where the row is missing. Catch that with COALESCE:

SET    count_b = COALESCE(b.counted, a.count_b)

There is still a corner case if counted can be NULL. COALESCE cannot distinguish between two cases:
1. b.counted IS NULL.
2. b.counted does not exist.
If you need the distinction, use a CASE statement instead:

SET    count_b = CASE WHEN b.id IS NULL THEN a.count_b ELSE b.counted END

Adapt WHERE clauses accordingly.
The complete version, covering all NULL cases and avoiding empty updates:

UPDATE a
SET    count_b = CASE WHEN b.id IS NULL THEN a.count_b ELSE b.counted END
     , count_c = CASE WHEN c.id IS NULL THEN a.count_b ELSE c.counted END
FROM   b
FULL   JOIN c USING (id)
WHERE  a.id = COALESCE(b.id, c.id)
AND   (b.id IS NOT NULL AND b.counted IS DISTINCT FROM a.count_b OR
       c.id IS NOT NULL AND c.counted IS DISTINCT FROM a.count_c)
;

Looks complex, but should be faster if there is substantial overlap between the source tables.

Correlated subqueries

If there is no matching id in any of the source tables, the according row in a is not updated, which is typically what you want. You could use correlated subqueries instead to update all rows, no matter what - which is typically what you don't want, but your case may differ. Correlated subqueries are also typically much more expensive for big tables.

UPDATE a
SET    count_b = (SELECT b.counted FROM b WHERE b.id = a.id)
     , count_c = (SELECT c.counted FROM c WHERE c.id = a.id)

Again, you can avoid the NULL values with:

SET    count_b = COALESCE ((SELECT b.counted ...), count_b)

Would still waste empty updates which you are trying to minimize. You could add a WHERE clause with EXISTS for that ...

Update from multiple tables?

Answers (2)

Correlated subqueries

Related Questions