Liam
Liam

Reputation: 159

Aggregate by date and group and fill in missing dates in big query

I have a table with ID, a start date and an end date.

ID  start_date  end_date
1   01/01/2014  06/01/2014
2   10/01/2005  12/01/2015
3   08/01/2009  10/01/2012
...

I have another table with every event by ID with a date (truncated to month).

ID  month_year   amount
1   02/01/2014   100
1   03/01/2007   25
2   010/01/2010  50
...

I want to be able to get the monthly total by ID with the zero months added back in.

ID  month_year   amount
1   02/01/2007   100
1   03/01/2007   0
2   04/01/2007   0
...

Apologies if this is a basic question. It is very close to this question and this question, but due to the size of the dataset it is important to only fill in zeros for each ID between the start_date and end_date. It is also like enter link description here, but I'm unable to translate that into big query. My code right now is off, but something like:

WITH data AS(
            SELECT * FROM `Table 11` AS t0
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT MIN(start_date) FROM data)
     , (SELECT MAX(end_date) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)

SELECT DISTINCT ID month_year, 
    SUM(Amount) OVER (PARTITION BY ID, month_year) AS sum_amount,   
FROM data AS t0
LEFT JOIN all_months AS t1
ON t0.month_year=t1.month

Upvotes: 0

Views: 1146

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.data` AS (
  SELECT 1 id, '01/01/2014' start_date, '06/01/2014' end_date UNION ALL
  SELECT 2, '10/01/2005', '12/01/2015' UNION ALL
  SELECT 3, '08/01/2009', '10/01/2012' 
), `project.dataset.amounts` AS (
  SELECT 1 id, '02/01/2014' month_year, 100 amount UNION ALL
  SELECT 1, '03/01/2007', 25 UNION ALL
  SELECT 2, '10/01/2010', 50 
), all_months AS (
  SELECT id, FORMAT_DATE('%m/%d/%Y', month_year) month_year
  FROM `project.dataset.data`,
  UNNEST(GENERATE_DATE_ARRAY(PARSE_DATE('%m/%d/%Y', start_date), PARSE_DATE('%m/%d/%Y', end_date), INTERVAL 1 MONTH)) month_year
)
SELECT id, month_year, SUM(IFNULL(amount, 0)) amount
FROM all_months m
LEFT JOIN `project.dataset.amounts` a
USING (id, month_year)
GROUP BY id, month_year   

Upvotes: 3

Related Questions