Reputation: 17956
We have a tool that allows users to create their own groups. Within these groups, users can write posts. What I am trying to determine is the relationship between size of the group and total number of posts in that group.
I can do SQL statements to get a list of group names and the number of users in that group (Query 1) and a list of group names and the number of posts (Query 2) but I would like for both to be in the same query.
Query 1
select count(pg.personID) as GroupSize, g.GroupName
from Group g inner join PersonGroup pg g.GroupID = pg.GroupID
where LastViewed between @startDate and @enddate and
g.Type = 0
group by g.GroupID, g.GroupName
order by GroupSize
Query 2
select count(gp.PostID) as TotalPosts, g.GroupName
from Group g inner join GroupPost gp on g.GroupID = gp.GroupID
inner join Post p on gp.PostID = p.PostID
where g.Type = 0 and
gp.Created between @startDate and @enddate
group by g.GroupID, g.GroupName
order by TotalPosts
**Note: A person can post the same "post" to multiple groups
I believe from this data I could build a Histogram (# of groups with 10-20 users, 21-30 users, etc..) and incorporate average number of posts for groups in those different bins.
Upvotes: 0
Views: 1939
Reputation: 1269445
Paul's solution assumes that the two sets of groups (by posts and by users) is the same. This may not be true, so either a full outer join or union all is needed.
My preference is the following:
with groups as
(
select *
from Group g
where g.Type = 0
and g.LastViewed between @startDate and @enddate
)
select GroupId, GroupName, SUM(GroupSize) as GroupSize, SUM(TotalPosts) as TotalPosts)
from
(
(select groups.GroupId, groups.GroupName, 1 as GroupSize, 0 as TotalPosts
from groups
join PersonGroup pg
on pg.GroupId = groups.groupId
)
union all
(select groups.GroupId, groups.GroupName, 0 as GroupSize, 1 as TotalPosts
from groups
join GroupPost gp
on groups.GroupId = gp.GroupId
join Post p
on gp.PostId = p.PostId
)
)
group by GroupId, GroupName
The "with" clause defines the set of groups that you are using. This places the definition in one place, making it obvious that the two subqueries have the same filtering. The two subqueries simply have flags indicating each of the two variables, which are then aggregated at the higher level. Sometimes it is more efficient to also do the aggregation inside the subqueries, particularly when there are indexes.
Upvotes: 0
Reputation: 21014
A simple solution would be to use those queries as Sub queries, and combine them:
SELECT
grps.GroupName,
grps.GroupSize,
psts.TotalPosts
FROM (
select count(pg.personID) as GroupSize, g.GroupName, g.GroupID
from Group g inner join PersonGroup pg g.GroupID = pg.GroupID
where LastViewed between @startDate and @enddate and
g.Type = 0
group by g.GroupID, g.GroupName
order by GroupSize) grps
JOIN (
select count(gp.PostID) as TotalPosts, g.GroupName, g.groupID
from Group g inner join GroupPost gp on g.GroupID = gp.GroupID
inner join Post p on gp.PostID = p.PostID
where g.Type = 0 and
gp.Created between @startDate and @enddate
group by g.GroupID, g.GroupName
order by TotalPosts) psts
ON psts.GroupID = grps.GroupID
Upvotes: 2