Reputation: 35
I have data that looks like this:
Status Order
NULL 1
NULL 2
1 3
NULL 4
NULL 5
0 6
NULL 7
NULL 8
NULL 9
1 10
I am trying to add a new column that fills in the nulls with the previous number, such as:
Status New
NULL NULL
NULL NULL
1 1
NULL 1
NULL 1
0 0
NULL 0
NULL 0
NULL 0
1 1
I am using hive, and do have access to pyspark, but was trying to solve through SQL if possible. I have looked into window functions (Rank, Dense_rank, Row_number, Lag, Lead). Any suggestions?
Upvotes: 0
Views: 280
Reputation: 1269763
What you need are equivalence groups. Then you can use window functions to fill in the data. A group can be defined as the number of non-null values up to an including a given value.
This assumes that you have a column that specifies the ordering.
So:
select t.*, max(status) over (partition by grp) as new_status
from (select t.*, count(status) over (order by ?) as grp
from t
) t;
The ?
is for the column that specifies the ordering.
Upvotes: 1