Joe Doe
Joe Doe

Reputation: 573

count is multiplied after adding left join

in my query http://www.sqlfiddle.com/#!9/16b9c2/3/0

SELECT f.name, COUNT(s.letters), AVG(t.numbers)

FROM first f

INNER JOIN second s 
ON s.first_id = f.id AND s.letters IN ('aa', 'bb', 'cc', 'dd')

LEFT JOIN third t
ON t.first_id = f.id

GROUP BY f.id

before joining third table the count was 4 - 1 but after joining third table it became 16 - 1 which is false because it must be 4 - 1

Upvotes: 1

Views: 254

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270081

Your average may also be subtly off -- or at least require additional unnecessarily calculation.

The reason is that the number of rows is being multiplied along two dimensions. The more accurate solution is to aggregate before doing the join:

SELECT f.name, s.num_letters, AVG(t.numbers)
FROM first f INNER JOIN
     (SELECT s.first_id, COUNT(*) as num_letters
      FROM second s 
      WHERE s.letters IN ('aa', 'bb', 'cc', 'dd')  
      GROUP BY s.first_id
     ) s
     ON s.first_id = f.id LEFT JOIN
     third t
     ON t.first_id = f.id
GROUP BY f.id, s.num_letters;

For such queries it might be best to move all the aggregation into subqueries:

SELECT f.name, s.num_letters, t.avg_numbers
FROM first f INNER JOIN
     (SELECT s.first_id, COUNT(*) as num_letters
      FROM second s 
      WHERE s.letters IN ('aa', 'bb', 'cc', 'dd')  
      GROUP BY s.first_id
     ) s
     ON s.first_id = f.id LEFT JOIN
     (SELECT t.first_id, AVG(t.numbers) as avg_numbers
      FROM third t
      GROUP BY t.first_id
     ) t
     ON t.first_id = f.id;

Upvotes: 2

yifanwu
yifanwu

Reputation: 1645

A bit ugly but gets the answer: http://www.sqlfiddle.com/#!9/16b9c2/26

SELECT s2.name, s2.letters, AVG(t.numbers)

FROM (
  select f.id, f.name as name, COUNT(s.letters) as letters
  from first f
  JOIN second s 
    ON s.first_id = f.id AND s.letters IN ('aa', 'bb', 'cc', 'dd')
  GROUP BY f.id) s2
JOIN third t
  ON t.first_id = s2.id
group by s2.name;

Whether it's a left join or join or inner join in your case makes no difference since every value is joinable. The "fanout" occurs because the numbers and letters are effectively cross joined since there are no join conditions on them.

The solution suggested does two group by's which gets rid of the crossing by aggregating at every level.

Upvotes: 1

Bohemian
Bohemian

Reputation: 425083

Each row returned is counted, so multiple successful joins from f to s will then be joined again for each successful jointo t.

To fix, use count(distinct ...):

SELECT f.name, COUNT(distinct s.letters), AVG(t.numbers)
...

Upvotes: 0

Related Questions