rubengura
rubengura

Reputation: 469

How to replace NaN values on a pandas subset of columns?

I would like to replace the missing values on columns C and D from the following pd.DataFrame:

df = pd.DataFrame({"A": [1,2,3], 
                   "B": [1, np.nan, 3],
                   "C": [np.nan, 2, 3],
                   "D": [1, 2, np.nan]})

df


    A   B       C       D
0   1   1.00    nan     1.00
1   2   nan     2.00    2.00
2   3   3.00    3.00    nan

I can do it if go column by column replacing the values:

df["C"].fillna(0, inplace=True)
df["D"].fillna(0, inplace=True)

However, if I try to do it on both columns at the same time I get a SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame:

df[["C", "D"]].fillna(0, inplace=True)

I have also tried to change the values using .loc, but it doesn't work either:

df.loc[:,["C", "D"]].fillna(0, inplace=True)

Is there any other way to replace the missing values in place without having to write a line of code for each column?

Upvotes: 1

Views: 3257

Answers (3)

Marc
Marc

Reputation: 11

The problem occurs because in one case you're working with a view of the dataframe, in the other you are using a copy.

df["C"] returns a view into df (a data series), so changing that with .fillna(0, inplace=True) changes the actual df dataframe.

df[["C", "D"]] however returns a copy of a part of df, as would df[["C"]], incidentally, because of the double brackets. So if you change that with .fillna(0, inplace=True), only the copy would get changed, so you don't see the change in the original df. That's why pandas gives the SettingWithCopyWarning warning.

The logic of whether the operation returns a view or a copy of the dataframe is not really intuitive. This has some details on that.

Your own solution

df[["C", "D"]] = df[["C", "D"]].fillna(0)

works, because you're making a copy, filling it with zeros, and reassigning it to the original df.

Another solution when there's a larger list of columns would be:

cols = ['C', 'D']
for c in cols: df[c].fillna(0, inplace=True)

Upvotes: 1

Pygirl
Pygirl

Reputation: 13349

You can try:

fill_map = {col:0 for col in ['C', 'D']}
df = df.fillna(value=fill_map)

df:

    A   B   C   D
0   1   1.0 0.0 1.0
1   2   NaN 2.0 2.0
2   3   3.0 3.0 0.0

Upvotes: 2

rubengura
rubengura

Reputation: 469

While writing the question I have found a possible solution:

df[["C", "D"]] = df[["C", "D"]].fillna(0)

Upvotes: 1

Related Questions