Reputation: 475
I am sure I must be missing something obvious. I am trying to line up two tables with different measurement data for analysis, and my counts are coming back enormously high when I join the two tables together.
Here are the correct counts from my table1
select line_item_id,sum(is_imp) as imps
from table1
where line_item_id=5993252
group by 1;
Here are the correct counts from table2
select cs_line_item_id,sum(grossImpressions) as cs_imps
from table2
where cs_line_item_id=5993252
group by 1;
When I join the tables together, my counts become inaccurate:
select a.line_item_id,sum(a.is_imp) as imps,sum(c.grossImpressions) as cs_imps
from table1 a join table2 c
ON a.line_item_id=c.cs_line_item_id
where a.line_item_id=5993252
group by 1;
I'm using aggregates, group by, filtering, so I'm not sure where I'm going wrong. Here is the schema for these tables:
Upvotes: 0
Views: 634
Reputation: 1269773
You are generating a Cartesian product for each line_item_id
. There are two relatively simply ways to solve this, one with a full join
, the other with union all
:
select line_item_id, sum(imps) as imps, sum(grossImpressions) as cs_imps
from ((select a.line_time_id, sum(is_imp) as imps, 0 as grossImpressions
from table1 a
where a.line_item_id = 5993252
group by a.line_item_id
) union all
(select c.line_time_id, 0 as imps, sum(grossImpressions) as grossImpressions
from table2 c
where c.line_item_id = 5993252
group by c.line_item_id
)
) ac
group by line_item_id;
You can remove the where
clause from the subqueries to get the total for all line_tiem_id
s. Note that this works even when one or the other table has no matching rows for a given line_item_id
.
For performance, you really want to do the filtering before the group by
.
Upvotes: 1
Reputation: 2686
select a.*, b.imps table2_imps from
(select line_item_id,sum(is_imp) as imps
from table1
group by 1)a
join
(select line_item_id,sum(is_imp) as imps
from table1
group by 1)b
on a.select line_item_id=b.select line_item_id
Upvotes: 2