Cheetaiean
Cheetaiean

Reputation: 941

How to join two tables when a matching row may not potentially be in one of the tables

I need to merge two tables, spring_stats and summer_stats, into one table in which some of the columns are the same and thus should be summed.

Each table contains (among others) the fields hunter_id, year, fowl, deer, bears where the last three represent numeric amount each hunter has caught.

The end result should be hunter_id, year, spring.fowl + summer.fowl, spring.deer + summer.deer, etc

HOWEVER, some of the hunters may not have participated in the summer session but participated in the spring session (or vice versa). In this case the standard

SELECT hunter_id, year, spring.fowl + summer.fowl AS total_fowl, ... FROM spring, summer 
WHERE spring.hunter_id = summer.hunter_id AND spring.year = summer.year 

would not work as hunters who were active in only the spring or summer session would not be recorded and included, whereas I need all hunters included, regardless of whether they were active in only one session or both.

Upvotes: 0

Views: 36

Answers (1)

JNevill
JNevill

Reputation: 50034

You are using an ancient type of table join. Instead adopt the newer (since the early 90s) join syntax. Here you want a FULL OUTER JOIN:

SELECT COALESCE(summer.hunter_id, spring.hunter_id) as hunter_id, 
    COALESCE(summer.year, spring.year) as year, 
    spring.fowl + summer.fowl AS total_fowl, ... 
FROM spring 
    FULL OUTER JOIN summer  
        ON spring.hunter_id = summer.hunter_id 
        AND spring.year = summer.year 

You can read up about FULL OUTER JOIN here at the postgres documentation site:

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

The COALESCE() function will first use the hunter_id from the summer table unless it's NULL (due to the FULL OUTER JOIN) in which case it will pick the hunter_id from the spring table.


I think it's also worth mentioning that having tables specific to your seasons may not make sense. Instead a single table where Season is just an added column may be a better schema:

 season | year | hunter_id | animal | animal_count
 summer | 2020 | 1         | fowl   | 3
 spring | 2020 | 1         | deer   | 2
 summer | 2020 | 2         | fowl   | 4
 spring | 2021 | 3         | fowl   | 1

Now your query is:

SELECT 
    hunter_id, 
    year, 
    sum(CASE WHEN animal='fowl' THEN animal_count END) as fowl_total,
    sum(CASE WHEN animal='dear' THEN animal_count END) as deer_total
FROM this_new_table
GROUP BY hunter_id, year

Upvotes: 2

Related Questions