Reputation: 11
So I'm working on SQL Server 2008 and I have this query which should be quite simple, but for some reason doesn't work. It basically looks like that:
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
AVG(q.Score) AS "Average Question Rating",
COUNT(q.Id) AS "N. of Questions posted by the agent",
AVG(a.Score) AS "Average Answer Rating",
COUNT(a.Id) AS "N. of Answers posted by the agent"
FROM
Users u,
Answers a,
Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate
When I only work on either the Answers table or the Questions table, everything is ok. But as soon as I try to do both at once (like in the query above), the COUNTs don't work at all. What I get is that the COUNT(a.Id) is identical to the COUNT(q.Id). So I tried reducing my query to see what was wrong, and I realized that I just had to add the Questions or the Answers table (even without using them anywhere) to the FROM clause when working with the other table and everything was ruined.
I'm sure it's something ridiculously trivial that I have overlooked but it's driving me crazy, I'd be thankful if anybody could point me what went wrong. Thank you in advance.
Upvotes: 3
Views: 294
Reputation: 769
Wouldn't just counting the DISTINCT Ids work?
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
AVG(q.Score) AS "Average Question Rating",
COUNT(DISTINCT q.Id) AS "N. of Questions posted by the agent",
AVG(a.Score) AS "Average Answer Rating",
COUNT(DISTINCT a.Id) AS "N. of Answers posted by the agent"
FROM
Users u,
Answers a,
Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate
Upvotes: 1
Reputation:
As noted elsewhere, your join on user ID on both questions and answers essentially produces a cartesian join at the user level between the two tables. A better approach would be to use a union:
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
AVG(q_score) AS "Average Question Rating",
COUNT(q_id) AS "N. of Questions posted by the agent",
AVG(a_score) AS "Average Answer Rating",
COUNT(a_id) AS "N. of Answers posted by the agent"
FROM Users u
JOIN (select OwnerUserId,
Score q_score,
Id q_id,
NULL a_score,
NULL a_id
from Answers
union all
select OwnerUserId,
NULL q_score,
NULL q_id,
Score a_score,
Id a_id
from Questions) qa
ON qa.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate
Upvotes: 3
Reputation: 220797
You're not joining Answers
and Questions
correctly for the aggregation. Between Answers
and Questions
, the result is a cartesian product (for every user, every answer is coupled with every question)
The simplest way to correct this is to perform aggregation in subqueries:
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
ISNULL((SELECT AVG(Score) FROM Answers WHERE OwnerUserId = u.Id), 0)
AS "Average Question Rating",
(SELECT COUNT(*) FROM Answers WHERE OwnerUserId = u.Id)
AS "N. of Questions posted by the agent",
ISNULL((SELECT AVG(Score) FROM Questions WHERE OwnerUserId = u.Id), 0)
AS "Average Answer Rating",
(SELECT COUNT(*) FROM Questions WHERE OwnerUserId = u.Id)
AS "N. of Answers posted by the agent"
FROM Users u
Alternatively using joins:
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
ISNULL(q.a, 0) AS "Average Question Rating",
ISNULL(q.c, 0) AS "N. of Questions posted by the agent",
ISNULL(a.a, 0) AS "Average Answer Rating",
ISNULL(a.c, 0) AS "N. of Answers posted by the agent"
FROM Users u
-- If you LEFT JOIN these tables, you'll get also results for users without
-- questions or answers
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c
FROM Questions GROUP BY OwnerUserId) q
ON q.OwnerUserId = u.Id
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c
FROM Answers GROUP BY OwnerUserId) a
ON a.OwnerUserId = u.Id
I don't know SQL Server's query optimiser well enough, so I can't say which one is going to be faster. The first solution could take advantage of scalar subquery caching, if that is available in SQL Server. Otherwise, the second query maybe performs less nested loops.
Upvotes: 5
Reputation: 714
If it were me I would do explicit joins on those other tables (answers and questions). how is it linking the other tables if you don't do a join?
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
AVG(q.Score) AS "Average Question Rating",
COUNT(q.Id) AS "N. of Questions posted by the agent",
AVG(a.Score) AS "Average Answer Rating",
COUNT(a.Id) AS "N. of Answers posted by the agent"
FROM
Users u,
Answers a,
Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate
would be
SELECT TOP 10
u.Id AS "UserId",
u.CreationDate AS "Member since",
AVG(q.Score) AS "Average Question Rating",
COUNT(q.Id) AS "N. of Questions posted by the agent",
AVG(a.Score) AS "Average Answer Rating",
COUNT(a.Id) AS "N. of Answers posted by the agent"
FROM
Users u
JOIN Answers a on u.ID = a.ID (assuming thats how answers and users are linked).
JOIN Questions q on a.ID = q.ID (assuming thats how questions and answers are linked)
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate
Upvotes: -1