Reputation: 160
I would like to replace missing values based on the values of the column Submitted.
Find below what I have:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | NaN | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | NaN | NaN |
2020 | GER | 1 | 361 | 321 |
An this is what I would like to have:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | 267 | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | 451 | 381 |
2020 | GER | 1 | 361 | 321 |
I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1.
I would like to change the values by the previous row only if the respective "Submitted" value is "NaN".
Thank you
Upvotes: 1
Views: 1410
Reputation: 614
I just used a for loop to check and update the values in the dataframe
import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
if(row['Submitted']=='NaN'):
df.at[index,'Age12']=prevValue12
df.at[index,'Age14']=prevValue14
prevValue12 = row['Age12']
prevValue14 = row['Age14']
print(df)
output
Year Country Submitted Age12 Age14
0 2018 CHI 1 267 30
1 2019 CHI NaN 267 30
2 2020 CHI 1 244 203
Upvotes: 0
Reputation: 23227
You can use .filter()
to select the related columns and put the columns in the list cols
. Then, use .mask()
to change the values of the selected columns by forward fill using ffill()
when Submitted
is NaN
, as follows:
cols = df.filter(like='Age').columns
df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())
Result:
print(df)
Year Country Submitted Age12 Age14
0 2018 CHI 1.0 267.0 NaN
1 2019 CHI NaN 267.0 NaN
2 2020 CHI 1.0 244.0 203.0
3 2018 ALB 1.0 163.0 165.0
4 2019 ALB 1.0 NaN NaN
5 2020 ALB 1.0 161.0 NaN
6 2018 GER 1.0 451.0 381.0
7 2019 GER NaN 451.0 381.0
8 2020 GER 1.0 361.0 321.0
Upvotes: 0
Reputation: 19957
You can do a conditional ffill()
using np.where
import numpy as np
(
df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
.assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)
Upvotes: 0
Reputation: 76396
Try using where
together with what you did:
df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))
This will replace the entries only when Submitted
is null.
Upvotes: 1