jexplore
jexplore

Reputation: 69

How to cross join but using latest value in BIGQUERY

I have this table below

date id value
2021-01-01 1 3
2021-01-04 1 5
2021-01-05 1 10

And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id

date id value
2021-01-01 1 3
2021-01-02 1 3
2021-01-03 1 3
2021-01-04 1 5
2021-01-05 1 10
2021-01-06 1 10

I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this

Upvotes: 1

Views: 636

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

I would write this using:

select dte, t.id, t.value
from (select t.*,
             lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
      from `table` t
     ) t cross join
     unnest(generate_date_array(
             date,
             ifnull(
                     date_add(next_date, interval -1 day), -- generate missing date rows
                     (select max(date) from `table`) -- add last row
             )
     )) dte;

Note that this requires neither union all nor window function to fill in the values.

Upvotes: 1

Logan
Logan

Reputation: 1371

alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)

WITH
  query AS (
  SELECT
    date,
    id,
    value
  FROM
    `mydataset.newtable`
  ORDER BY
    date ),
  generated_days AS (
  SELECT
    day
  FROM (
    SELECT
      MIN(date) min_dt,
      MAX(date) max_dt
    FROM
      query),
    UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
  g.day,
  LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
  LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
  generated_days g
LEFT OUTER JOIN
  query q
ON
  g.day = q.date
ORDER BY
  g.day

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Consider below approach

select * from `project.dataset.table` 
union all
select missing_date, prev_row.id, prev_row.value
from (
  select *, lag(t) over(partition by id order by date) prev_row
  from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date

Upvotes: 1

Related Questions