Add missing date rows inheriting the previous value

Question

I'm trying to obtain, using Big Query, the table on "Desired result" (see figure below), starting from a Dummy table which can be initialized as follows:

with dummy_data as
(
  select * from unnest([
    struct(
      date('2021-06-10') AS timestamp,
      'A' AS category,
      '5' as value),
    (date('2021-06-11'), 'B', '3'),
    (date('2021-06-12'), 'C', '2'),
    (date('2021-06-14'), 'A', '7')
  ])
)
select * from dummy_data

I've made a start by generating a time series and cross joining but having trouble getting rid of the unnecessary rows that arise from this:

with dummy_data as
(
  select * from unnest([
    struct(
      date('2021-06-10') AS timestamp,
      'A' AS category,
      '5' as values ),
    (date('2021-06-11'), 'B', '3'),
    (date('2021-06-12'), 'C', '2'),
    (date('2021-06-14'), 'A', '7')
  ])
),
-- Initialize date series that covers needed range
date_series as (
    select
        *
    from 
        unnest(generate_date_array('2021-06-10', '2021-06-14', interval 1 day)) as date
),
-- cross_join
cross_join as (
    select * from dummy_data
    cross join date_series
)
select * from cross_join

Which results in the following (rows in blue are the new ones I need, rows in gray the ones I'd like to remove):

Thanks!

Mikhail Berlyant · Accepted Answer

Consider below approach

select  timestamp, category, value 
from (
  select timestamp as start_timestamp, category, value, ifnull(
    lead(timestamp) over(partition by category order by timestamp) - 1,
    max(timestamp) over(order by timestamp desc) 
    ) next_timestamp
  from dummy_data
), unnest(generate_date_array(start_timestamp, next_timestamp)) timestamp

if applied to sample data in your question - output is

Add missing date rows inheriting the previous value

Answers (1)

Related Questions