Prerak Sola
Prerak Sola

Reputation: 10009

SQL count using multiple join

I have three tables related to each other in the following way:

  1. host (has multiple sessions)
  2. session (has multiple processes)
  3. processes

The table structure is as follows:

  1. host table - id, name
  2. session table - id, host_id, name
  3. process table - id, session_id, name

What I am trying to achieve is the count of the number of sessions and the count of the number of processes on each host. To achieve this I tried the following query, but the output is wrong.

select host.id, 
       count(sessions.id) as "session count", 
       count(process.id) as "process count"
from host as host
     left outer join sessions as sessions on host.id = sessions.host_id
     left outer join process as process on sessions.id = process.session_id
group by host.id;

Here's the SQLFiddle to the schema.

As per the data in the fiddle, the output should be:

id | session count | process count 
----------------------------------
1  |     2         |   3
2  |     1         |   2
3  |     1         |   2
4  |     2         |   3

But what I get is:

id | session count | process count 
----------------------------------
1  |     3         |   3
2  |     2         |   2
3  |     2         |   2
4  |     3         |   3

What can be the correct query to get the desired output?

Upvotes: 2

Views: 125

Answers (4)

theo
theo

Reputation: 8881

The real issue here is that you have a chain of 1 to many relationships you are working with. If it were just one relationship in the chain a count() function would work fine with no issues. But having them chained together results in the intermediary object (Session in this case) being replicated numerous times by the final relationship. This is why you are getting elevated Session counts.

You could use distinct, which counts each identifier only once. The answer by John Faz is correct, but you would only really need one distinct, not two, since the final table of the relationship (process) won't be replicated.

select
    host_id = H.ID,
    session_count = count(distinct S.ID),
    process_count = count(P.ID)
    from host H
        left join sessions S on H.ID = S.host_id
        left join process as P on S.ID = P.session_id
    group by H.ID

Another option would be to perform your count in multiple stages using a CTE. I think this would be less performant, particularly if you have a larger set of data, but it accurately models the counts you're trying to do.

;with cteSessions (session_id, host_id, process_count) as (
    select
        session_id = S.ID,
        S.host_id,
        process_count = count(1)
        from sessions S
            left join process P on S.ID = P.session_id
        group by
            S.ID,
            S.host_id
)
select
    host_id = H.ID,
    session_count = count(S.session_id),
    process_count = sum(isnull(s.process_count, 0))
    from host H
        left join cteSessions S on H.ID = S.host_id
    group by 
        H.ID

You could also use sub-queries. Which I hate, but it would work

select
    host_id = H.ID,
    session_count = (select count(1) from sessions s where s.host_id = H.ID),
    process_count = (select count(1) from sessions s join process p on s.id = p.session_id where s.host_id = H.ID)
    from host H

Upvotes: 0

Martin Brown
Martin Brown

Reputation: 25310

John Faz's answer is better, however as you asked for other ways, it is possible to do this with sub queries as well like this:

select
  host.id,
  (select count(*) from sessions where host_id = host.id) as "session count",
  (select count(*) from process join sessions on process.session_id = sessions.id where sessions.host_id = host.id)  as "process count"
 from
   host

EDIT:

Actually I take back that bit about John Faz's answer being better. I just ran an execution plan over the two and my query took 28% and John's took 50% (22% set up and tear down). I was using only the very small amount of data from the SQL Fiddle example and with big data and different index choices things are likely to be different. However it does show that this query may be better in some circumstances.

Upvotes: 1

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112324

If you query without the group by-clause, you will see that you are getting the same session id multiple times. Therefore you sessions count is too high.

select h.id as hid, s.id as sid, p.id as pid
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
order by h.id, s.id, p.id;

hid sid pid
-----------
1   1   1
1   1   2
1   2   5
2   5   8
2   5   9
3   3   3
3   3   7
4   4   4
4   4   6
4   6   10

Therefore use count(distinct s.id) for the sessions:

select h.id as hid, count(distinct s.id) as session_count, count(p.id) as process_count
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
group by h.id

Upvotes: 1

user1529235
user1529235

Reputation:

Distinct;

select host.id, 
       count(distinct sessions.id) as "session count", 
       count(distinct process.id) as "process count"
from host as host
     left outer join sessions as sessions on host.id = sessions.host_id
     left outer join process as process on sessions.id = process.session_id
group by host.id;

Upvotes: 6

Related Questions