I would like to replace missing values based on the values of the column Submitted. Find below what I have: Year Country Submitted Age12 Age14 2018 CHI 1 267 NaN 2019 CHI NaN NaN NaN 2020 CHI 1 244 203 2018 ALB 1 163 165 2019 ALB 1 NaN NaN 2020 ALB 1 161 NaN 2018 GER 1 451 381 2019 GER NaN NaN NaN 2020 GER 1 361 321 An this is what I would like to have: Year Country Submitted Age12 Age14 2018 CHI 1 267 NaN 2019 CHI NaN 267 NaN 2020 CHI 1 244 203 2018 ALB 1 163 165 2019 ALB 1 NaN NaN 2020 ALB 1 161 NaN 2018 GER 1 451 381 2019 GER NaN 451 381 2020 GER 1 361 321 I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1. I would like to change the values by the previous row only if the respective "Submitted" value is "NaN". Thank you

Reputation: 160

Replace missing values based on value of a specific column in Python

I would like to replace missing values based on the values of the column Submitted.

Find below what I have:

Year	Country	Submitted	Age12	Age14
2018	CHI	1	267	NaN
2019	CHI	NaN	NaN	NaN
2020	CHI	1	244	203
2018	ALB	1	163	165
2019	ALB	1	NaN	NaN
2020	ALB	1	161	NaN
2018	GER	1	451	381
2019	GER	NaN	NaN	NaN
2020	GER	1	361	321

An this is what I would like to have:

Year	Country	Submitted	Age12	Age14
2018	CHI	1	267	NaN
2019	CHI	NaN	267	NaN
2020	CHI	1	244	203
2018	ALB	1	163	165
2019	ALB	1	NaN	NaN
2020	ALB	1	161	NaN
2018	GER	1	451	381
2019	GER	NaN	451	381
2020	GER	1	361	321

I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1.

I would like to change the values by the previous row only if the respective "Submitted" value is "NaN".

Thank you

Upvotes: 1

Answers (4)

Shreyas Prakash

Reputation: 614

I just used a for loop to check and update the values in the dataframe

import pandas as pd
new_data = [[2018,'CHI',1,267,30], [2019,'CHI','NaN','NaN','NaN'], [2020,'CHI',1,244,203]]
df = pd.DataFrame(new_data, columns = ['Year','Country','Submitted','Age12','Age14'])
prevValue12 = df.iloc[0]['Age12']
prevValue14 = df.iloc[0]['Age14']
for index, row in df.iterrows():
    if(row['Submitted']=='NaN'):
        df.at[index,'Age12']=prevValue12
        df.at[index,'Age14']=prevValue14
    prevValue12 = row['Age12']
    prevValue14 = row['Age14']
print(df)

output

Year Country Submitted Age12 Age14
0  2018     CHI         1   267    30
1  2019     CHI       NaN   267    30
2  2020     CHI         1   244   203

Upvotes: 0

SeaBean

Reputation: 23227

You can use .filter() to select the related columns and put the columns in the list cols. Then, use .mask() to change the values of the selected columns by forward fill using ffill() when Submitted is NaN, as follows:

cols = df.filter(like='Age').columns

df[cols] = df[cols].mask(df['Submitted'].isna(), df[cols].ffill())

Result:

print(df)

   Year Country  Submitted  Age12  Age14
0  2018     CHI        1.0  267.0    NaN
1  2019     CHI        NaN  267.0    NaN
2  2020     CHI        1.0  244.0  203.0
3  2018     ALB        1.0  163.0  165.0
4  2019     ALB        1.0    NaN    NaN
5  2020     ALB        1.0  161.0    NaN
6  2018     GER        1.0  451.0  381.0
7  2019     GER        NaN  451.0  381.0
8  2020     GER        1.0  361.0  321.0

Upvotes: 0

Allen Qin

Reputation: 19957

You can do a conditional ffill() using np.where

import numpy as np
(
    df.assign(Age12=np.where(df.Submitted.isna(), df.Age12.ffill(), df.Age12))
    .assign(Age14=np.where(df.Submitted.isna(), df.Age14.ffill(), df.Age14))
)

Upvotes: 0

Ami Tavory

Reputation: 76406

Try using where together with what you did:

 df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))

This will replace the entries only when Submitted is null.

Upvotes: 1

Replace missing values based on value of a specific column in Python

Answers (4)

Related Questions