Adam Hopkinson
Adam Hopkinson

Reputation: 28795

BigQuery LEFT JOIN is doubling-up values

I'm trying to merge two datasets - one of sales targets and another of actual sales, both by day and market (US/UK).

To do this, I'm using a third table which is using GENERATE_DATE_ARRAY to create a master list of dates to report on - so that I don't end up with gaps where there's no target set and no sales reported.

I've found I'm getting sales counted twice, so have reduced my data and query down to a reproducible state:

#standardSQL

WITH dates AS (
  SELECT day FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2018-07-05', '2018-07-09', INTERVAL 1 DAY)) AS day
),
targets AS (
  SELECT DATE '2018-07-06' AS day, 'UK' AS Market, NUMERIC '2.4' AS quantity
  UNION ALL SELECT '2018-07-06', "US", 8.4
  UNION ALL SELECT '2018-07-06', "US", 1.2
  UNION ALL SELECT '2018-07-08', "UK", 3.0
  UNION ALL SELECT '2018-07-08', "US", 10.9
),
sales AS (
  SELECT DATE '2018-07-08' AS day, 'UK' AS Market, 4 AS quantity
  UNION ALL SELECT '2018-07-06', 'US', 15
)

SELECT 
  dates.day AS day,
  targets.market AS market,
  SUM(targets.quantity) AS targetQuantity,
  SUM(sales.quantity) AS quantity
FROM dates
LEFT JOIN targets
  ON dates.day = CAST(targets.day AS DATE)
LEFT JOIN sales
  ON dates.day = CAST(sales.day AS DATE) AND targets.market = sales.market
GROUP BY day, market
ORDER BY day, market

This gives the following results:

Results of the query

The results show that the reported sales quantity for July 6th (row #3) is 30, despite being 15 in the data.

It's happening when there are two rows for that date & market in the targets data, but I can't work out how to code for this.

Thanks for your help!

Upvotes: 0

Views: 603

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below should work. The idea is to pre-aggregate both sales and targets tables to avoid dups

#standardSQL
WITH dates AS (
  SELECT day FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2018-07-05', '2018-07-09', INTERVAL 1 DAY)) AS day
), targets AS (
  SELECT DATE '2018-07-06' AS day, 'UK' AS Market, NUMERIC '2.4' AS quantity
  UNION ALL SELECT '2018-07-06', "US", 8.4
  UNION ALL SELECT '2018-07-06', "US", 1.2
  UNION ALL SELECT '2018-07-08', "UK", 3.0
  UNION ALL SELECT '2018-07-08', "US", 10.9
), sales AS (
  SELECT DATE '2018-07-08' AS day, 'UK' AS Market, 4 AS quantity
  UNION ALL SELECT '2018-07-06', 'US', 15
)
SELECT 
  dates.day AS day,
  t.market AS market,
  targetQuantity,
  quantity
FROM dates 
LEFT JOIN (SELECT day, market, SUM(quantity) AS targetQuantity FROM targets GROUP BY day, market) t
  ON dates.day = CAST(t.day AS DATE)
LEFT JOIN (SELECT day, market, SUM(quantity) AS quantity FROM sales GROUP BY day, market) s
  ON dates.day = CAST(s.day AS DATE) AND t.market = s.market
ORDER BY day, market

Upvotes: 2

Related Questions