ytu
ytu

Reputation: 1850

Counting with outer joins only returns data from which count is not zero

Say I have two tables: user and diary.

select count(id)
from user
where is_tester is false

Here the primary keys are always called id. This query above gives me about 270000, which means I have about 270000 users. Now I'd like to know how many diaries each user has. So I went:

select u.id as user_id, u.dm_type, count(d.id) as bg_count
from diary as d
right join (
    select id, dm_type
    from user
    where is_tester is false
) as u
on d.user_id = u.id
where d.glucose_value > 0
group by u.id, u.dm_type

Each user can have only one kind of dm_type. I am expecting that it would tell me how many diaries each user has, and if one doesn't have any diary, it would give me an NA or 0 since I used right join. However, the returned table has only about 75000 rows and every users in that table have at least one diary. That's not what I want. Why does it happen and how should I do it correctly?

I've referred to Combining RIGHT JOIN with COUNT, and count one specific field according to the suggestion from the accepted answer.


Edits according to the comments:

user:

| id | dm_type | is_tester |
|----|---------|-----------|
| 1  | 1       | False     |
| 2  | 1       | False     |
| 3  | 2       | False     |
| 4  | no      | False     |
| 5  | 2       | True      |

diary:

| id | user_id | glucose_value |
|----|---------|---------------|
| 1  | 1       | -2            |
| 2  | 1       | 80            |
| 3  | 2       | 78            |
| 4  | 2       | 100           |
| 5  | 4       | 83            |
| 6  | 5       | 90            |

Expected result:

| user_id | dm_type | bg_count |
|---------|---------|----------|
| 1       | 1       | 1        |
| 2       | 1       | 2        |
| 3       | 2       | 0        |
| 4       | no      | 1        |

Upvotes: 1

Views: 421

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270843

The problem with your query is the where clause. It is filtering out unmatched diaries from the outer join.

When you are approaching this type of problem, I strongly recommend left join over right join. It means "keep all the rows in the first table even if nothing matches in the second". This is usually easier to follow than "keep all the rows in whatever table is at the end of the from clause but I haven't seen yet".

The next rule is that conditions on the first table go in the where clause because they really filter rows. Conditions on the second table go in the on clause. They don't filter rows but are used for matching.

So, you can phrase your query as:

select u.id as user_id, u.dm_type, count(d.id) as bg_count
from user u left join
     diary d
     on d.user_id = u.id and d.glucose_value > 0
where u.is_tester is false
group by u.id, u.dm_type;

No subquery is needed.

Upvotes: 0

Fahmi
Fahmi

Reputation: 37483

Try with left join, it will give you all user whether it has diary count or not If any user has no diary then it will give you null

select u.id as user_id, u.dm_type, count(d.id) as bg_count from
(select id, dm_type from user where is_tester is false)u
left join diary d on d.user_id = u.id and d.glucose_value > 0
group by u.id, u.dm_type

Upvotes: 1

Related Questions