Don Quixote
Don Quixote

Reputation: 145

I am trying to fill all NaN values in rows with number data types to zero in pandas

I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently.

df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)

This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.

Upvotes: 5

Views: 4899

Answers (3)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Use either DF.combine_first (does not act inplace):

df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))

or DF.update (modifies inplace):

df.update(df.select_dtypes(include=[np.number]).fillna(0))

The reason why fillna fails is because DF.select_dtypes returns a completely new dataframe which although forms a subset of the original DF, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the DF it gets derived from.

Note that np.number selects all numeric type.

Upvotes: 5

Vaishali
Vaishali

Reputation: 38415

Consider a dataframe like this

    col1    col2    col3    id
0   1       1       1       a
1   0       NaN     1       a
2   NaN     1       1       NaN
3   1       0       1       b

You can select the numeric columns and fillna

num_cols = df.select_dtypes(include=[np.number]).columns
df[num_cols]=df.select_dtypes(include=[np.number]).fillna(0)


    col1    col2    col3    id
0   1       1       1       a
1   0       0       1       a
2   0       1       1       NaN
3   1       0       1       b

Upvotes: 0

boot-scootin
boot-scootin

Reputation: 12515

Your pandas.DataFrame.select_dtypes approach is good; you've just got to cross the finish line:

>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
             A    B    C
0          NaN  NaN  4.0
1       string  NaN  NaN
2       string  3.0  5.0
3  more string  4.0  6.0

Don't try to perform the in-place fillna here (there's a time and place for inplace=True, but here is not one). You're right in that what's returned by select_dtypes is basically a view. Create a new dataframe called filled and join the filled (or "fixed") columns back with your original data:

>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
     B    C
0  0.0  4.0
1  0.0  0.0
2  3.0  5.0
3  4.0  6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
             A    B    C  B_filled  C_filled
0          NaN  NaN  4.0       0.0       4.0
1       string  NaN  NaN       0.0       0.0
2       string  3.0  5.0       3.0       5.0
3  more string  4.0  6.0       4.0       6.0

Then you can drop whatever original columns you had to keep only the "filled" ones:

>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
             A  B_filled  C_filled
0          NaN       0.0       4.0
1       string       0.0       0.0
2       string       3.0       5.0
3  more string       4.0       6.0

Upvotes: 3

Related Questions