user3456269
user3456269

Reputation: 475

Redshift Query returning too many rows in aggregate join

I am sure I must be missing something obvious. I am trying to line up two tables with different measurement data for analysis, and my counts are coming back enormously high when I join the two tables together.

Here are the correct counts from my table1

select line_item_id,sum(is_imp) as imps 
from table1 
where line_item_id=5993252 
group by 1;

enter image description here

Here are the correct counts from table2

select cs_line_item_id,sum(grossImpressions) as cs_imps
from table2 
where cs_line_item_id=5993252 
group by 1;

enter image description here

When I join the tables together, my counts become inaccurate:

select a.line_item_id,sum(a.is_imp) as imps,sum(c.grossImpressions) as cs_imps
from table1 a join table2 c
ON a.line_item_id=c.cs_line_item_id
where a.line_item_id=5993252
group by 1;

enter image description here

I'm using aggregates, group by, filtering, so I'm not sure where I'm going wrong. Here is the schema for these tables: enter image description here

Upvotes: 0

Views: 634

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You are generating a Cartesian product for each line_item_id. There are two relatively simply ways to solve this, one with a full join, the other with union all:

select line_item_id, sum(imps) as imps, sum(grossImpressions) as cs_imps
from ((select a.line_time_id, sum(is_imp) as imps, 0 as grossImpressions
       from table1 a
       where a.line_item_id = 5993252
       group by a.line_item_id
      ) union all
      (select c.line_time_id, 0 as imps, sum(grossImpressions) as grossImpressions
       from table2 c
       where c.line_item_id = 5993252
       group by c.line_item_id
      )
     ) ac
group by line_item_id;

You can remove the where clause from the subqueries to get the total for all line_tiem_ids. Note that this works even when one or the other table has no matching rows for a given line_item_id.

For performance, you really want to do the filtering before the group by.

Upvotes: 1

Daniel Marcus
Daniel Marcus

Reputation: 2686

select a.*, b.imps table2_imps from
(select line_item_id,sum(is_imp) as imps 
from table1  
group by 1)a
join 
(select line_item_id,sum(is_imp) as imps 
from table1  
group by 1)b
on a.select line_item_id=b.select line_item_id

Upvotes: 2

Related Questions