jab
jab

Reputation: 33

Duplicating records to fill gap between dates in Google BigQuery

So I've found similar resources that address how to do this in SQL, like this: Duplicating records to fill gap between dates

I understand that BigQuery may not be the best place to do this, so I'm trying to see if it's at all possible. When trying to run some of the methods in the link above above I'm hitting a wall as some of the functions aren't supported within BigQuery.

If a table exists with data structured like so:

    MODIFY_DATE             SKU         STORE   STOCK_ON_HAND
    08/01/2016 00:00:00     1120010     21      100
    08/05/2016 00:00:00     1120010     21      75
    08/07/2016 00:00:00     1120010     21      40

How can I build a query within Google BigQuery that yields an output like the one below? A value at a given date is repeated until the next change for the dates in between:

    MODIFY_DATE             SKU         STORE   STOCK_ON_HAND
    08/01/2016 00:00:00     1120010     21      100
    08/02/2016 00:00:00     1120010     21      100
    08/03/2016 00:00:00     1120010     21      100
    08/04/2016 00:00:00     1120010     21      100
    08/05/2016 00:00:00     1120010     21      75
    08/06/2016 00:00:00     1120010     21      75
    08/07/2016 00:00:00     1120010     21      40

I know I need to generate a table that has all the dates within a given range, but I'm having a hard time understanding if this can be done. Any ideas?

Upvotes: 3

Views: 1486

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

How can I build a query within Google BigQuery that yields an output like the one below? A value at a given date is repeated until the next change for the dates in between

See example below

SELECT
  MODIFY_DATE, 
  MAX(SKU_TEMP) OVER(PARTITION BY grp) AS SKU,
  MAX(STORE_TEMP) OVER(PARTITION BY grp) AS STORE,
  MAX(STOCK_ON_HAND_TEMP) OVER(PARTITION BY grp) AS STOCK_ON_HAND,
FROM (
  SELECT
    DAY AS MODIFY_DATE, SKU AS SKU_TEMP, STORE AS STORE_TEMP, STOCK_ON_HAND AS STOCK_ON_HAND_TEMP,
    COUNT(SKU) OVER(ORDER BY DAY ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp,
  FROM (
    SELECT DATE(DATE_ADD(TIMESTAMP("2016-08-01"), pos - 1, "DAY")) AS DAY
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP("2016-08-07"), TIMESTAMP("2016-08-01")), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
  ) AS DATES
  LEFT JOIN (
    SELECT DATE(MODIFY_DATE) AS MODIFY_DATE, SKU, STORE, STOCK_ON_HAND 
    FROM 
      (SELECT "2016-08-01" AS MODIFY_DATE, "1120010" AS SKU, 21 AS STORE, 75 AS STOCK_ON_HAND),
      (SELECT "2016-08-05" AS MODIFY_DATE, "1120010" AS SKU, 22 AS STORE, 100 AS STOCK_ON_HAND),
      (SELECT "2016-08-07" AS MODIFY_DATE, "1120011" AS SKU, 23 AS STORE, 40 AS STOCK_ON_HAND),
  ) AS TABLE_WITH_GAPS
  ON TABLE_WITH_GAPS.MODIFY_DATE = DATES.DAY
)
ORDER BY MODIFY_DATE

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

I need to generate a table that has all the dates within a given range, but I'm having a hard time understanding if this can be done. Any ideas?

SELECT DATE(DATE_ADD(TIMESTAMP("2016-08-01"), pos - 1, "DAY")) AS DAY
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP("2016-08-07"), TIMESTAMP("2016-08-01")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

Upvotes: 0

Related Questions