Why doesn't this simple SQL query work?

So I'm working on SQL Server 2008 and I have this query which should be quite simple, but for some reason doesn't work. It basically looks like that:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u, 
        Answers a, 
        Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

When I only work on either the Answers table or the Questions table, everything is ok. But as soon as I try to do both at once (like in the query above), the COUNTs don't work at all. What I get is that the COUNT(a.Id) is identical to the COUNT(q.Id). So I tried reducing my query to see what was wrong, and I realized that I just had to add the Questions or the Answers table (even without using them anywhere) to the FROM clause when working with the other table and everything was ruined.

I'm sure it's something ridiculously trivial that I have overlooked but it's driving me crazy, I'd be thankful if anybody could point me what went wrong. Thank you in advance.

Upvotes: 3

Views: 294

Answers (4)

domager
domager

Reputation: 769

Wouldn't just counting the DISTINCT Ids work?

SELECT TOP 10 
    u.Id                         AS "UserId", 
    u.CreationDate               AS "Member since", 
    AVG(q.Score)                 AS "Average Question Rating",   
    COUNT(DISTINCT q.Id)         AS "N. of Questions posted by the agent", 
    AVG(a.Score)                 AS "Average Answer Rating",   
    COUNT(DISTINCT a.Id)          AS "N. of Answers posted by the agent" 
FROM     
        Users u,  
        Answers a,  
        Questions q 
WHERE q.OwnerUserId = u.Id 
AND a.OwnerUserId = u.Id 
GROUP BY u.Id, u.CreationDate 

Upvotes: 1

user359040
user359040

Reputation:

As noted elsewhere, your join on user ID on both questions and answers essentially produces a cartesian join at the user level between the two tables. A better approach would be to use a union:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q_score)        AS "Average Question Rating",  
    COUNT(q_id)         AS "N. of Questions posted by the agent",
    AVG(a_score)        AS "Average Answer Rating",  
    COUNT(a_id)         AS "N. of Answers posted by the agent"
FROM Users u
JOIN (select OwnerUserId,
             Score        q_score,
             Id           q_id,
             NULL         a_score,
             NULL         a_id
      from Answers
      union all
      select OwnerUserId,
             NULL         q_score,
             NULL         q_id,
             Score        a_score,
             Id           a_id
      from Questions) qa
  ON qa.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

Upvotes: 3

Lukas Eder
Lukas Eder

Reputation: 220797

You're not joining Answers and Questions correctly for the aggregation. Between Answers and Questions, the result is a cartesian product (for every user, every answer is coupled with every question)

The simplest way to correct this is to perform aggregation in subqueries:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    ISNULL((SELECT AVG(Score) FROM Answers   WHERE OwnerUserId = u.Id), 0)
                        AS "Average Question Rating",  
           (SELECT COUNT(*)   FROM Answers   WHERE OwnerUserId = u.Id)        
                        AS "N. of Questions posted by the agent",
    ISNULL((SELECT AVG(Score) FROM Questions WHERE OwnerUserId = u.Id), 0)
                        AS "Average Answer Rating",  
           (SELECT COUNT(*)   FROM Questions WHERE OwnerUserId = u.Id)
                        AS "N. of Answers posted by the agent"
FROM  Users u

Alternatively using joins:

SELECT TOP 10
     u.Id                AS "UserId",
     u.CreationDate      AS "Member since",
     ISNULL(q.a, 0)      AS "Average Question Rating",  
     ISNULL(q.c, 0)      AS "N. of Questions posted by the agent",
     ISNULL(a.a, 0)      AS "Average Answer Rating",  
     ISNULL(a.c, 0)      AS "N. of Answers posted by the agent"
FROM Users u
-- If you LEFT JOIN these tables, you'll get also results for users without
-- questions or answers
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Questions GROUP BY OwnerUserId) q
     ON  q.OwnerUserId = u.Id
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Answers GROUP BY OwnerUserId) a
     ON  a.OwnerUserId = u.Id

I don't know SQL Server's query optimiser well enough, so I can't say which one is going to be faster. The first solution could take advantage of scalar subquery caching, if that is available in SQL Server. Otherwise, the second query maybe performs less nested loops.

Upvotes: 5

Bryce
Bryce

Reputation: 714

If it were me I would do explicit joins on those other tables (answers and questions). how is it linking the other tables if you don't do a join?

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u, 
        Answers a, 
        Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

would be

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u
JOIN Answers a on u.ID = a.ID (assuming thats how answers and users are linked).
JOIN Questions q on a.ID = q.ID (assuming thats how questions and answers are linked)
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

Upvotes: -1

Related Questions