CelloRibeiro
CelloRibeiro

Reputation: 160

Replace missing values based on value of a specific column in Python

I would like to replace missing values based on the values of the column Submitted.

Find below what I have:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN NaN NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN NaN NaN
2020 GER 1 361 321

An this is what I would like to have:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN 267 NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN 451 381
2020 GER 1 361 321

I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1.

I would like to change the values by the previous row only if the respective "Submitted" value is "NaN".

Thank you

Upvotes: 1

Views: 1410

Answers (4)

Shreyas Prakash
Shreyas Prakash

Reputation: 614

I just used a for loop to check and update the values in the dataframe

import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
    if(row['Submitted']=='NaN'):
        df.at[index,'Age12']=prevValue12
        df.at[index,'Age14']=prevValue14
    prevValue12 = row['Age12']
    prevValue14 = row['Age14']
print(df)

output

Year Country Submitted Age12 Age14
0  2018     CHI         1   267    30
1  2019     CHI       NaN   267    30
2  2020     CHI         1   244   203

Upvotes: 0

SeaBean
SeaBean

Reputation: 23227

You can use .filter() to select the related columns and put the columns in the list cols. Then, use .mask() to change the values of the selected columns by forward fill using ffill() when Submitted is NaN, as follows:

cols = df.filter(like='Age').columns

df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())

Result:

print(df)

   Year Country  Submitted  Age12  Age14
0  2018     CHI        1.0  267.0    NaN
1  2019     CHI        NaN  267.0    NaN
2  2020     CHI        1.0  244.0  203.0
3  2018     ALB        1.0  163.0  165.0
4  2019     ALB        1.0    NaN    NaN
5  2020     ALB        1.0  161.0    NaN
6  2018     GER        1.0  451.0  381.0
7  2019     GER        NaN  451.0  381.0
8  2020     GER        1.0  361.0  321.0

Upvotes: 0

Allen Qin
Allen Qin

Reputation: 19957

You can do a conditional ffill() using np.where

import numpy as np
(
    df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
    .assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)

Upvotes: 0

Ami Tavory
Ami Tavory

Reputation: 76396

Try using where together with what you did:

 df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))

This will replace the entries only when Submitted is null.

Upvotes: 1

Related Questions