Reputation: 636
I'm currently facing the following problem: I have 3 tables I need information from and both of these joins are one to many. For some reason second join creates duplicates of rows and as a result second return value gets messed up (bb.count gets multiplied by the amount of second join rows)
SELECT aa.id, sum(bb.count), count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id
Is there a way to get the proper sum of bb.count without another query? The moment I remove second left join everything's fine, unfortunately I need it for the third return value and I can't group them without resulting in a duplicate (sort of) rows in result.
Lets say there's
bb1.count = 9
bb2.count = 5
And there's 2 rows where cc.bb_id = bb1.id
The result I get is 23 instead of 14.
Upvotes: 9
Views: 10158
Reputation: 28233
You're experiencing aggregate fanout in the above query.
This happens because there is
aaa
& bbb
bbb
& ccc
The latter join creates M
duplicates for rows that exist in bbb
if they are joined to M rows via the join to ccc
To fix the error, split the query up into two CTEs & join the result.
WITH agg_bb AS (
SELECT aa.id, sum(bb.count)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
GROUP BY aa.id
)
, agg_cc AS (SELECT aa.id, count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id
)
SELECT * FROM agg_bb JOIN agg_cc USING (id)
In general, to avoid fan outs, only apply aggregate operations to the columns from the rightmost relation in a series of joins. If you find you're aggregating columns from the middle tables, split the query up as I have done above. Only the following functions are invariant across a fan out: COUNT DISTINCT
, MIN
, MAX
Upvotes: 15