Manura Omal
Manura Omal

Reputation: 1055

How to generate date series to occupy absent dates in google BiqQuery?

I am trying to get daily sum of sales from a google big-query table. I used following code for that.

select Day(InvoiceDate) date, Sum(InvoiceAmount) sales from test_gmail_com.sales 
where year(InvoiceDate) = Year(current_date()) and
Month(InvoiceDate) = Month(current_date())
group by date order by date

From the above query it gives only the sum of sales daily which were in the table. There is a chance that some days do not have any sales. For those kind of situations, I need to get the date and sum should be 0. As an example, in every month should 30 0r 31 rows with sum of sales. Examples show below. 4th day of the month does not have a sales. So its sum should be 0.

date | sales
-----+------
1    |   259
-----+------
2    |   359
-----+------
3    |   45
-----+------
4    |    0
-----+------
5    |  156

Is it possible to do in Big-query? Basically date column should be a series from 1 - 28/29/30 or 31st depending on the month of the year

Upvotes: 21

Views: 41350

Answers (5)

Cohen
Cohen

Reputation: 730

Generting a list of dates and then joining whatever table you need on top seems the easiest. I used the generate_date_array + unnest and it looks quite clean.

To generate a list of days (one day per row):

  SELECT
  *
  FROM 
    UNNEST(GENERATE_DATE_ARRAY('2018-10-01', '2020-09-30', INTERVAL 1 DAY)) AS example

Upvotes: 63

blueogive
blueogive

Reputation: 528

Using the Standard SQL dialect and the generate_array function to simplify the code:

WITH serialnum AS (
  SELECT
    sn
  FROM
    UNNEST(GENERATE_ARRAY(0, 
                          DATE_DIFF(DATE_ADD(DATE_TRUNC(CURRENT_DATE()
                                                      , MONTH)
                                          , INTERVAL 1 MONTH)
                                  , DATE_TRUNC(CURRENT_DATE(), MONTH)
                                  , DAY) - 1)
                          ) AS sn
), date_seq AS (
SELECT
    DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH),
            INTERVAL(sn) DAY) AS this_day
FROM
  serialnum
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;

UPDATE

Or, simpler still using the generate_date_array function:

WITH date_seq AS (
SELECT
  GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), MONTH), 
                      DATE_ADD(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH)
                                        , INTERVAL 1 MONTH)
                               , INTERVAL -1 DAY)
                      , INTERVAL 1 DAY)
    AS this_day
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;

Upvotes: 4

martin.code
martin.code

Reputation: 1311

For Standard SQL

WITH

splitted AS (
  SELECT
    *
  FROM
    UNNEST( SPLIT(RPAD('',
          1 + DATE_DIFF(CURRENT_DATE(), DATE("2015-06-01"), DAY),
          '.'),''))),
  with_row_numbers AS (
  SELECT
    ROW_NUMBER() OVER() AS pos,
    *
  FROM
    splitted),
  calendar_day AS (
  SELECT
    DATE_ADD(DATE("2015-06-01"), INTERVAL (pos - 1) DAY) AS day
  FROM
    with_row_numbers)
SELECT
  *
FROM
  calendar_day
ORDER BY
  day DESC

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

You can use below to generate on fly all dates in given range (in below example it is all dates from 2015-06-01 till CURRENT_DATE() - by changing those you can control which dates range to generate)

SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

so, now - you can use it with LEFT JOIN with your table to have all dates accounted. See potential example below

SELECT
  calendar_day,
  IFNULL(sales, 0) AS sales
FROM (
  SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
  FROM (
       SELECT ROW_NUMBER() OVER() AS pos, *
       FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
       FROM (SELECT NULL)),h
  )))
) AS all_dates
LEFT JOIN (
  SELECT DAY(InvoiceDate) DATE, SUM(InvoiceAmount) sales 
  FROM test_gmail_com.sales 
  WHERE YEAR(InvoiceDate) = YEAR(CURRENT_DATE()) AND
  MONTH(InvoiceDate) = MONTH(CURRENT_DATE())
  GROUP BY DATE 
)
ON DATE = calendar_day  

I wanna need to get previous months sales

Below gives all days of previous month

SELECT DATE(DATE_ADD(DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(DATE_ADD(CURRENT_DATE(), - DAY(CURRENT_DATE()), "DAY"), DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

Upvotes: 12

oulenz
oulenz

Reputation: 1244

For these purposes it is practical to have a 'calendar' table, a table that just lists all the days within a certain range. For your specific question, it would suffice to have a table with the numbers 1 to 31. A quick way to get this table is to make a spreadsheet with these numbers, save it as a csv file and import this file into BigQuery as a table.

You then left outer join your result set onto this table, with ifnull(sales,0) as sales.

If you want the number of days per month (28--31) to be right, you basically have two options. Either you create a proper calendar table that covers several years and that you join on using year, month and day. Or you use the simple table with numbers 1--31 and remove numbers based on the month and the year.

Upvotes: 1

Related Questions