user11052700
user11052700

Reputation:

JOIN is blowing up my totals despite being correct independently

I'll start of with two independent queries, which gives me the correct results for each:

SELECT
  DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) week_start,
  SUM(traffic) traffic
FROM
  `ga.daily_traffic` 
  WHERE traffic_date >= '2019-03-04'

Returns:

+--------------+---------+
| traffic_week | traffic |
+--------------+---------+
| 2019-03-04   |   66572 |
+--------------+---------+

The second query:

  SELECT
  week_start,
  SUM(traffic) traffic
FROM
 `marketing.channel_spend`
 WHERE week_start = '2019-03-04'

Returns:

+------------+----------+
| week_start |  spend   |
+------------+----------+
| 2019-03-04 | 80143.07 |
+------------+----------+

I should note for that second query: The field week_start is already stored at weekly increments, which is perhaps what's causing this(?) When I join the two together, as such:

SELECT
  week_start,
  SUM(spend) spend,
  SUM(traffic) traffic
FROM
  `ga.daily_traffic` ga
LEFT JOIN `marketing.channel_spend` chan
ON DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) = chan.week_start
WHERE week_start = '2019-03-04'
GROUP BY 1
ORDER BY 1 DESC

Produces the following:

+------------+---------+-----------+
| week_start | traffic |   spend   |
+------------+---------+-----------+
| 2019-03-04 |  153115 | 561001.49 |
+------------+---------+-----------+

What's causing the totals for traffic and spend to blow up?

Upvotes: 0

Views: 47

Answers (2)

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 31991

you could use cte

with cte as
(
SELECT
  DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) week_start,
  SUM(traffic) traffic
FROM
  `ga.daily_traffic` 
  WHERE traffic_date >= '2019-03-04'
),cte2 as
(
SELECT
  week_start,
  SUM(traffic) traffic
FROM
 `marketing.channel_spend`
 WHERE week_start = '2019-03-04'
) select cte.week_start,cte.traffic,cte2.traffic as chanel_traffic  from  cte left join cte2 on cte.week_start=cte2.week_start

Upvotes: 1

Roman Czerwinski
Roman Czerwinski

Reputation: 559

Gordon is correct. You most likely have a many to one or many to many relationship between the marketing.channel_spend and the ga.daily_traffic tables. In this case the occurrence of 2 or more of the same date in those 2 tables will yield a join on each occurrence in the first with every occurrence in the second table. This will explode your results. You should aggregate pre-join so that you are making a one to one join on the date meaning that none will duplicate.

SELECT
    chan.week_start,
    chan.spend spend,
    ga.traffic traffic
FROM (
    SELECT
        SUM(traffic) traffic,
        DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) ga_date
    FROM
        `ga.daily_traffic` 
    GROUP BY
        ga_date
) ga
LEFT JOIN (
    SELECT
        SUM(spend) spend,
        week_start
    FROM
        `marketing.channel_spend`
    GROUP BY
        week_start
) chan ON ga.ga_date = chan.week_start
WHERE chan.week_start = '2019-03-04'

Upvotes: 1

Related Questions