Reputation:
I'll start of with two independent queries, which gives me the correct results for each:
SELECT
DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) week_start,
SUM(traffic) traffic
FROM
`ga.daily_traffic`
WHERE traffic_date >= '2019-03-04'
Returns:
+--------------+---------+
| traffic_week | traffic |
+--------------+---------+
| 2019-03-04 | 66572 |
+--------------+---------+
The second query:
SELECT
week_start,
SUM(traffic) traffic
FROM
`marketing.channel_spend`
WHERE week_start = '2019-03-04'
Returns:
+------------+----------+
| week_start | spend |
+------------+----------+
| 2019-03-04 | 80143.07 |
+------------+----------+
I should note for that second query: The field week_start
is already stored at weekly increments, which is perhaps what's causing this(?) When I join the two together, as such:
SELECT
week_start,
SUM(spend) spend,
SUM(traffic) traffic
FROM
`ga.daily_traffic` ga
LEFT JOIN `marketing.channel_spend` chan
ON DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) = chan.week_start
WHERE week_start = '2019-03-04'
GROUP BY 1
ORDER BY 1 DESC
Produces the following:
+------------+---------+-----------+
| week_start | traffic | spend |
+------------+---------+-----------+
| 2019-03-04 | 153115 | 561001.49 |
+------------+---------+-----------+
What's causing the totals for traffic and spend to blow up?
Upvotes: 0
Views: 47
Reputation: 31991
you could use cte
with cte as
(
SELECT
DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) week_start,
SUM(traffic) traffic
FROM
`ga.daily_traffic`
WHERE traffic_date >= '2019-03-04'
),cte2 as
(
SELECT
week_start,
SUM(traffic) traffic
FROM
`marketing.channel_spend`
WHERE week_start = '2019-03-04'
) select cte.week_start,cte.traffic,cte2.traffic as chanel_traffic from cte left join cte2 on cte.week_start=cte2.week_start
Upvotes: 1
Reputation: 559
Gordon is correct. You most likely have a many to one or many to many relationship between the marketing.channel_spend
and the ga.daily_traffic
tables. In this case the occurrence of 2 or more of the same date in those 2 tables will yield a join on each occurrence in the first with every occurrence in the second table. This will explode your results. You should aggregate pre-join so that you are making a one to one join on the date meaning that none will duplicate.
SELECT
chan.week_start,
chan.spend spend,
ga.traffic traffic
FROM (
SELECT
SUM(traffic) traffic,
DATE_TRUNC(ga.traffic_date, WEEK(MONDAY)) ga_date
FROM
`ga.daily_traffic`
GROUP BY
ga_date
) ga
LEFT JOIN (
SELECT
SUM(spend) spend,
week_start
FROM
`marketing.channel_spend`
GROUP BY
week_start
) chan ON ga.ga_date = chan.week_start
WHERE chan.week_start = '2019-03-04'
Upvotes: 1