Reputation: 107
I have these two columns in Big Query: budget_id and activity:
budget_id | activity | region | execution window |
---|---|---|---|
000507_Corporate | 507 | Corporate | 2022 |
000508_AMERICAS | 508 | AMERICAS | 2022Q2 |
NULL | NULL | c | b |
NULL | NULL | c | b |
The budget_id comes from a Google Sheet which is manually inputted by a stakeholder. I'm trying to change this that going forward, I can automate this myself. Everything from 000001 to 000508 mostly comes from a spreadsheet. I'm trying to automated this going forward. However, this has been trickier than I thought.
WITH blah AS (
select *,
IF(activity IS NULL, last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) + 1, activity) AS new_activity
from `rax-datamart-dev.marketing.auto_budget_framework`)
select budget_id,
activity,
region,
execution_window,
new_activity
from blah
order by activity NULLS last
budget_id | activity | region | execution window | new_activity |
---|---|---|---|---|
000507_Corporate | 507 | Corporate | 2022 | 507 |
000508_AMERICAS | 508 | AMERICAS | 2022Q2 | 508 |
NULL | NULL | c | b | 509 |
NULL | NULL | c | b | 509 |
As you can see, I'm able to get 509 so that's promising. Unfortunately, I was hoping the next NULL value row would be 510. And then if there is another NULL value, 511, 512, 513 etc etc.
Is what I'm attempting even possible? I feel like I'm missing something but if anyone could help out, it would be greatly appreciated.
Upvotes: 1
Views: 1262
Reputation: 1780
I'm using similar data as you posted in your question.
with data as (
SELECT 'Americas_1 ' as id,1 as activity, 'America' as region union all
SELECT 'Americas_2 ' as id,2 as activity, 'America' as region union all
SELECT 'Americas_3 ' as id,3 as activity, 'America' as region union all
SELECT 'Americas_4 ' as id,4 as activity, 'America' as region union all
SELECT null as id,null as activity, 'c' as region union all
SELECT null as id,null as activity, 'a' as region
)
In the subquery data
, I just have the sample data. In the second subquery data2
, I added a column number
, this column adds the row_number
when the activity column
is null, if it is not null add a 0. The column new_activity
just puts the same numbers when activity is not null.
Here you can see the complete query.
with data as (
SELECT 'Americas_1 ' as id,1 as activity, 'America' as region union all
SELECT 'Americas_2 ' as id,2 as activity, 'America' as region union all
SELECT 'Americas_3 ' as id,3 as activity, 'America' as region union all
SELECT 'Americas_4 ' as id,4 as activity, 'America' as region union all
SELECT null as id,null as activity, 'c' as region union all
SELECT null as id,null as activity, 'a' as region
), data2 as (
select id,activity, region,
IF (activity is null,ROW_NUMBER() OVER(ORDER BY activity),0) as number,
IF(activity IS NULL,
last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ,
activity ) as new_activity
from data
group by id,activity, region
order by activity asc nulls last
)
This query displays these columns ID, activity, region
; and in the last column new_Activity
, I sum the column number
and new_activity
from the subquery data2
select id, activity, region, (number+new_activity) as new_Activity from data2
order by activity asc nulls last
This is the output of the query.
Upvotes: 2