Reputation: 10009
I have three tables related to each other in the following way:
The table structure is as follows:
id, name
id, host_id, name
id, session_id, name
What I am trying to achieve is the count of the number of sessions and the count of the number of processes on each host. To achieve this I tried the following query, but the output is wrong.
select host.id,
count(sessions.id) as "session count",
count(process.id) as "process count"
from host as host
left outer join sessions as sessions on host.id = sessions.host_id
left outer join process as process on sessions.id = process.session_id
group by host.id;
Here's the SQLFiddle to the schema.
As per the data in the fiddle, the output should be:
id | session count | process count
----------------------------------
1 | 2 | 3
2 | 1 | 2
3 | 1 | 2
4 | 2 | 3
But what I get is:
id | session count | process count
----------------------------------
1 | 3 | 3
2 | 2 | 2
3 | 2 | 2
4 | 3 | 3
What can be the correct query to get the desired output?
Upvotes: 2
Views: 125
Reputation: 8881
The real issue here is that you have a chain of 1 to many relationships you are working with. If it were just one relationship in the chain a count() function would work fine with no issues. But having them chained together results in the intermediary object (Session in this case) being replicated numerous times by the final relationship. This is why you are getting elevated Session counts.
You could use distinct, which counts each identifier only once. The answer by John Faz is correct, but you would only really need one distinct, not two, since the final table of the relationship (process) won't be replicated.
select
host_id = H.ID,
session_count = count(distinct S.ID),
process_count = count(P.ID)
from host H
left join sessions S on H.ID = S.host_id
left join process as P on S.ID = P.session_id
group by H.ID
Another option would be to perform your count in multiple stages using a CTE. I think this would be less performant, particularly if you have a larger set of data, but it accurately models the counts you're trying to do.
;with cteSessions (session_id, host_id, process_count) as (
select
session_id = S.ID,
S.host_id,
process_count = count(1)
from sessions S
left join process P on S.ID = P.session_id
group by
S.ID,
S.host_id
)
select
host_id = H.ID,
session_count = count(S.session_id),
process_count = sum(isnull(s.process_count, 0))
from host H
left join cteSessions S on H.ID = S.host_id
group by
H.ID
You could also use sub-queries. Which I hate, but it would work
select
host_id = H.ID,
session_count = (select count(1) from sessions s where s.host_id = H.ID),
process_count = (select count(1) from sessions s join process p on s.id = p.session_id where s.host_id = H.ID)
from host H
Upvotes: 0
Reputation: 25310
John Faz's answer is better, however as you asked for other ways, it is possible to do this with sub queries as well like this:
select
host.id,
(select count(*) from sessions where host_id = host.id) as "session count",
(select count(*) from process join sessions on process.session_id = sessions.id where sessions.host_id = host.id) as "process count"
from
host
EDIT:
Actually I take back that bit about John Faz's answer being better. I just ran an execution plan over the two and my query took 28% and John's took 50% (22% set up and tear down). I was using only the very small amount of data from the SQL Fiddle example and with big data and different index choices things are likely to be different. However it does show that this query may be better in some circumstances.
Upvotes: 1
Reputation: 112324
If you query without the group by
-clause, you will see that you are getting the same session id multiple times. Therefore you sessions count is too high.
select h.id as hid, s.id as sid, p.id as pid
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
order by h.id, s.id, p.id;
hid sid pid
-----------
1 1 1
1 1 2
1 2 5
2 5 8
2 5 9
3 3 3
3 3 7
4 4 4
4 4 6
4 6 10
Therefore use count(distinct s.id)
for the sessions:
select h.id as hid, count(distinct s.id) as session_count, count(p.id) as process_count
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
group by h.id
Upvotes: 1
Reputation:
Distinct;
select host.id,
count(distinct sessions.id) as "session count",
count(distinct process.id) as "process count"
from host as host
left outer join sessions as sessions on host.id = sessions.host_id
left outer join process as process on sessions.id = process.session_id
group by host.id;
Upvotes: 6