João Maia
João Maia

Reputation: 115

How do I fill null columns with values from the previous row?

I am trying to perform something here. I want to have all the coluns filled with values. But when I have the null column, I want to have it filled with the value from the previous not null column.

with cte as (
select '2019-11-12 16:01:55' as timestamp, null as owner_id, null as owner_assigneddate, null as lastmodifieddate union all
select '2019-11-12 19:03:18' as timestamp, 39530934 as owner_id, '2019-11-12 19:03:18' as owner_assigneddate, '2019-11-12 19:03:18' as lastmodifieddate union all
select '2019-11-12 19:03:19' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:19' as lastmodifieddate union all
select '2019-11-12 19:03:20' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:20' as lastmodifieddate union all
select '2019-11-12 19:03:31' as timestamp, 40320368 as owner_id, '2019-11-12 19:03:31' as owner_assigneddate, '2019-11-12 19:03:31' as lastmodifieddate union all
select '2019-11-12 19:03:33' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:33' as lastmodifieddate union all
select '2019-11-12 19:03:56' as timestamp, null as owner_id, null as owner_assigneddate, '2019-11-12 19:03:356' as lastmodifieddate)

select timestamp,
       owner_id,
       owner_assigneddate,
       lastmodifieddate,
       COALESCE(owner_id, LEAD(owner_id) OVER(ORDER BY timestamp DESC)) AS test_column
from cte order by timestamp asc 

With the previous query I already managed to put the value only in the next row.

What I want to do is to have the all the columns filled with the value based on the previous row. The value from row 4 should be 39530934 and the value from row 7 should be 40320368. I think I am missing something here, but I dont't know what.

Upvotes: 4

Views: 4555

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269673

In BigQuery use LAST_VALUE() with the IGNORE NULLS option and COALESCE():

select timestamp,
       COALESCE(owner_id, last_value(owner_id ignore nulls) over (order by timestamp)) as owner_id,
       COALESCE(owner_assigneddate, LAST_VALUE(owner_assigneddate IGNORE NULLS) OVER (ORDER BY TIMESTAMP)) as owner_assigneddate,
       COALESCE(lastmodifieddate, LAST_VALUE(lastmodifieddate IGNORE NULLS) OVER (ORDER BY TIMESTAMP)) as lastmodifieddate
from cte order by timestamp asc 

Upvotes: 8

GMB
GMB

Reputation: 222432

As far as concerns, Big Query does not support ignore null in window functions. Here is a solution that relies on a window max to locate the record that holds the last non-null owner_id (this assumes uniqueness of timestamps). With this information at hand, you can then bring in the corresponding owner_id with a join.

select 
    c.timestamp,
    coalesce(c.owner_id, c_lag.owner_id) owner_id,
    c.owner_assigneddate,
    c.lastmodifieddate
from 
    (
        select
            cte.*,
            max(case when owner_id is not null then timestamp end) 
                over(order by timestamp rows unbounded preceding) target_timestamp
        from cte
    ) c
    left join cte c_lag 
        on c.owner_id is null 
        and c_lag.timestamp = c.target_timestamp

Demo on DB Fiddle:

timestamp           | owner_id | owner_assigneddate  | lastmodifieddate    
:------------------ | -------: | :------------------ | :-------------------
2019-11-12 16:01:55 |     null | null                | null                
2019-11-12 19:03:18 | 39530934 | 2019-11-12 19:03:18 | 2019-11-12 19:03:18 
2019-11-12 19:03:19 | 39530934 | null                | 2019-11-12 19:03:19 
2019-11-12 19:03:20 | 39530934 | null                | 2019-11-12 19:03:20 
2019-11-12 19:03:31 | 40320368 | 2019-11-12 19:03:31 | 2019-11-12 19:03:31 
2019-11-12 19:03:33 | 40320368 | null                | 2019-11-12 19:03:33 
2019-11-12 19:03:56 | 40320368 | null                | 2019-11-12 19:03:356

Note: to better understand the logic if needed, you can run the inner query independantly to see what it returns (see the db fiddle).


Edit

Rereading this, I find that the information that the window max provides is already available in your original data, in column owner_assigneddate... So that’s far simpler:

select 
    c.timestamp,
    coalesce(c.owner_id, c_lag.owner_id) owner_id,
    c.owner_assigneddate,
    c.lastmodifieddate
from
    cte c
    left join cte c_lag 
        on c.owner_id is null 
        and c_lag.timestamp = c.owner_assigneddate

Upvotes: 2

Michael Entin
Michael Entin

Reputation: 7734

This should work with your cte definition:

...
select timestamp,
       owner_id,
       owner_assigneddate,
       lastmodifieddate,
       LAST_VALUE(owner_id IGNORE NULLS) 
            OVER(ORDER BY timestamp ASC ROWS BETWEEN 
                 UNBOUNDED PRECEDING AND CURRENT ROW) AS test_column
from cte order by timestamp asc 

Upvotes: 2

Related Questions