Reputation: 469
I would like to replace the missing values on columns C
and D
from the following pd.DataFrame
:
df = pd.DataFrame({"A": [1,2,3],
"B": [1, np.nan, 3],
"C": [np.nan, 2, 3],
"D": [1, 2, np.nan]})
df
A B C D
0 1 1.00 nan 1.00
1 2 nan 2.00 2.00
2 3 3.00 3.00 nan
I can do it if go column by column replacing the values:
df["C"].fillna(0, inplace=True)
df["D"].fillna(0, inplace=True)
However, if I try to do it on both columns at the same time I get a SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
:
df[["C", "D"]].fillna(0, inplace=True)
I have also tried to change the values using .loc
, but it doesn't work either:
df.loc[:,["C", "D"]].fillna(0, inplace=True)
Is there any other way to replace the missing values in place without having to write a line of code for each column?
Upvotes: 1
Views: 3257
Reputation: 11
The problem occurs because in one case you're working with a view of the dataframe, in the other you are using a copy.
df["C"]
returns a view into df (a data series), so changing that with .fillna(0, inplace=True)
changes the actual df dataframe.
df[["C", "D"]]
however returns a copy of a part of df, as would df[["C"]]
, incidentally, because of the double brackets. So if you change that with .fillna(0, inplace=True)
, only the copy would get changed, so you don't see the change in the original df. That's why pandas gives the SettingWithCopyWarning
warning.
The logic of whether the operation returns a view or a copy of the dataframe is not really intuitive. This has some details on that.
Your own solution
df[["C", "D"]] = df[["C", "D"]].fillna(0)
works, because you're making a copy, filling it with zeros, and reassigning it to the original df.
Another solution when there's a larger list of columns would be:
cols = ['C', 'D']
for c in cols: df[c].fillna(0, inplace=True)
Upvotes: 1
Reputation: 13349
You can try:
fill_map = {col:0 for col in ['C', 'D']}
df = df.fillna(value=fill_map)
df:
A B C D
0 1 1.0 0.0 1.0
1 2 NaN 2.0 2.0
2 3 3.0 3.0 0.0
Upvotes: 2
Reputation: 469
While writing the question I have found a possible solution:
df[["C", "D"]] = df[["C", "D"]].fillna(0)
Upvotes: 1