Jonathan Vanasco
Jonathan Vanasco

Reputation: 15680

Update from multiple tables?

I'm trying to construct a query in which I update table_a with data contained in both table_b and table_c.

All the UPDATE ... FROM docs and questions I've found seem to be focused on updating table_a with data in table_c by joining through table_b. I don't want that, I need to use data contained in both table_b and table_c, which are both left joined to table_a

An example select would be:

SELECT 
        a.id 
    ,   b.counted as count_b
    ,   c.counted as count_c
FROM 
    table_a a
LEFT JOIN
    table_b b on a.id = b.id
LEFT JOIN
    table_c c on a.id = c.id
;


 id | count_b | count_c
----+---------+---------
 34 |       6 |       2
 43 |      16 |       2
 45 |      33 |       5

Working 'one table'-updates are:

UPDATE table_a SET count_b = table_b.counted
FROM table_b WHERE table_a.id = table_b.id;

UPDATE table_a SET count_c = table_c.counted
FROM table_c WHERE table_a.id = table_c.id;

However I'd like to combine them for efficiency (update is an insert+delete, so there's a bit of savings to be had. But I can't seem to figure out how to join two tables onto table_a in an UPDATE ... FROM.

I could handle this with a subquery - but I'd prefer avoiding that.

Upvotes: 1

Views: 75

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656586

To shorten my longish answer I work with table names a, b, c.
If a.id, b.id and c.id are UNIQUE (but can be missing):

Basic form with a number of caveats (keep reading!):

UPDATE a
SET    count_b = b.counted
     , count_c = c.counted
FROM   b
FULL   JOIN c USING (id)
WHERE  a.id = COALESCE(b.id, c.id)
;

Wouldn't work properly if any of the id columns are not UNIQUE - but neither would separate updates.

If nothing actually changes (all counted columns identical to what's in a already) you would just create a dead row for full cost. If that can be the case and you want to avoid it, append an additional WHERE clause:

AND   (a.count_b <> b.counted OR
       a.count_c <> c.counted)  -- simple form with all columns NOT NULL

FULL [OUTER] JOIN makes it work if some, but not all tables (one in the simple case of two source tables) does not have a row for the same id. It effectively emulates multiple LEFT JOIN you had in mind, with a small but important difference: You get phantom NULL values for columns from tables where the row is missing. Catch that with COALESCE:

SET    count_b = COALESCE(b.counted, a.count_b)

There is still a corner case if counted can be NULL. COALESCE cannot distinguish between two cases:
1. b.counted IS NULL.
2. b.counted does not exist.
If you need the distinction, use a CASE statement instead:

SET    count_b = CASE WHEN b.id IS NULL THEN a.count_b ELSE b.counted END

Adapt WHERE clauses accordingly.
The complete version, covering all NULL cases and avoiding empty updates:

UPDATE a
SET    count_b = CASE WHEN b.id IS NULL THEN a.count_b ELSE b.counted END
     , count_c = CASE WHEN c.id IS NULL THEN a.count_b ELSE c.counted END
FROM   b
FULL   JOIN c USING (id)
WHERE  a.id = COALESCE(b.id, c.id)
AND   (b.id IS NOT NULL AND b.counted IS DISTINCT FROM a.count_b OR
       c.id IS NOT NULL AND c.counted IS DISTINCT FROM a.count_c)
;

Looks complex, but should be faster if there is substantial overlap between the source tables.

Correlated subqueries

If there is no matching id in any of the source tables, the according row in a is not updated, which is typically what you want. You could use correlated subqueries instead to update all rows, no matter what - which is typically what you don't want, but your case may differ. Correlated subqueries are also typically much more expensive for big tables.

UPDATE a
SET    count_b = (SELECT b.counted FROM b WHERE b.id = a.id)
     , count_c = (SELECT c.counted FROM c WHERE c.id = a.id)

Again, you can avoid the NULL values with:

SET    count_b = COALESCE ((SELECT b.counted ...), count_b)

Would still waste empty updates which you are trying to minimize. You could add a WHERE clause with EXISTS for that ...

Upvotes: 1

Rahul
Rahul

Reputation: 77866

Not the exact query but more like a example. you can try like below, by getting the counted column from both table_b and table_b in a inner query and joining with it to table_a in update.

UPDATE table_a 
SET count_b = tableb_counted,
count_c = tablec_counted
FROM (
  SELECT table_b.counted as tableb_counted,
  table_b.id,
  table_c.counted as tablec_counted
  FROM table_b
  JOIN table_c
  ON table_b.id = table_c.id
) tab
WHERE table_a.id = tab.id 

Upvotes: 0

Related Questions