Reputation: 31
I have the following data frame which I want to apply bfill as follows:
'amount' | 'percentage' |
---|---|
Nan | |
1.0 | 20 |
2.0 | 10 |
Nan | |
Nan | |
Nan | |
Nan | |
3.0 | 50 |
4.0 | 10 |
Nan | |
5.0 | 10 |
I want to bfill Nan in the amount column as per percentage in the percentage column i.e., if the corresponding percentage is 50 then fill 50% of Nan before the number (partial fill). e.g. amount with 3.0 value have a percentage of 50 so out of 4 Nan entries, only 50% are to be bfill.
proposed output:
'amount' | 'percentage' |
---|---|
Nan | |
1.0 | 20 |
2.0 | 10 |
Nan | |
Nan | |
3.0 | |
3.0 | |
3.0 | 50 |
4.0 | 10 |
Nan | |
5.0 | 10 |
Please help.
Upvotes: 3
Views: 502
Reputation: 10970
Create groups according to NaN
s
df['group_id'] = df.amount.where(df.amount.isna(), 1).cumsum().bfill()
Create a filling function
def custom_fill(x):
# Calculate number of rows to be filled
max_fill_rows = math.floor(x.iloc[-1, 1] * (x.shape[0] - 1) / 100)
# Fill only if number of rows to fill is not zero
return x.bfill(limit=max_fill_rows) if max_fill_rows else x
Fill the DataFrame
df.groupby('group_id').apply(custom_fill)
Output
amount percentage group_id
0 NaN NaN 1.0
1 1.0 20.0 1.0
2 2.0 10.0 2.0
3 NaN NaN 3.0
4 NaN NaN 3.0
5 3.0 50.0 3.0
6 3.0 50.0 3.0
7 3.0 50.0 3.0
8 4.0 10.0 4.0
9 NaN NaN 5.0
10 5.0 10.0 5.0
PS: Don't forget to import the required libraries
import math
Upvotes: 4