runner16
runner16

Reputation: 107

Fill Null Values with Last Previous Value and add 1 as a continuous integer for every value going forward - Big Query

I have these two columns in Big Query: budget_id and activity:

budget_id activity region execution window
000507_Corporate 507 Corporate 2022
000508_AMERICAS 508 AMERICAS 2022Q2
NULL NULL c b
NULL NULL c b

The budget_id comes from a Google Sheet which is manually inputted by a stakeholder. I'm trying to change this that going forward, I can automate this myself. Everything from 000001 to 000508 mostly comes from a spreadsheet. I'm trying to automated this going forward. However, this has been trickier than I thought.

WITH blah AS (
select *,
IF(activity IS NULL, last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) + 1, activity) AS new_activity
from `rax-datamart-dev.marketing.auto_budget_framework`)
select budget_id,
activity, 
region,
execution_window,
new_activity
from blah
order by activity NULLS last
budget_id activity region execution window new_activity
000507_Corporate 507 Corporate 2022 507
000508_AMERICAS 508 AMERICAS 2022Q2 508
NULL NULL c b 509
NULL NULL c b 509

As you can see, I'm able to get 509 so that's promising. Unfortunately, I was hoping the next NULL value row would be 510. And then if there is another NULL value, 511, 512, 513 etc etc.

Is what I'm attempting even possible? I feel like I'm missing something but if anyone could help out, it would be greatly appreciated.

Upvotes: 1

Views: 1262

Answers (1)

Raul Saucedo
Raul Saucedo

Reputation: 1780

I'm using similar data as you posted in your question.

with data as (
 SELECT 'Americas_1 '  as id,1 as activity, 'America' as region union all
 SELECT 'Americas_2 '  as id,2 as activity, 'America' as region union all
 SELECT 'Americas_3 '  as id,3 as activity, 'America' as region union all
 SELECT 'Americas_4 '  as id,4 as activity, 'America' as region union all
 SELECT null  as id,null as activity, 'c' as region union all
 SELECT null  as id,null as activity, 'a' as region
)

In the subquery data, I just have the sample data. In the second subquery data2, I added a column number, this column adds the row_number when the activity column is null, if it is not null add a 0. The column new_activity just puts the same numbers when activity is not null.

Here you can see the complete query.

with data as (
 SELECT 'Americas_1 '  as id,1 as activity, 'America' as region union all
 SELECT 'Americas_2 '  as id,2 as activity, 'America' as region union all
 SELECT 'Americas_3 '  as id,3 as activity, 'America' as region union all
 SELECT 'Americas_4 '  as id,4 as activity, 'America' as region union all
 SELECT null  as id,null as activity, 'c' as region union all
 SELECT null  as id,null as activity, 'a' as region
), data2 as (
 
select id,activity, region, 
IF (activity is null,ROW_NUMBER() OVER(ORDER BY activity),0) as number,
IF(activity IS NULL,
last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ,
activity ) as new_activity
from data
group by id,activity, region
order by activity asc nulls last
)

This query displays these columns ID, activity, region; and in the last column new_Activity, I sum the column number and new_activity from the subquery data2

select id, activity, region, (number+new_activity) as new_Activity from data2
order by activity asc nulls last

This is the output of the query.

enter image description here

Upvotes: 2

Related Questions