Reputation: 61
I am trying to replace the nan values in a dataframe column 'Functional' using fillna()
function. The issues I am facing are below:
isnull()
dfcomp[dfcomp['Functional'].isnull()==True]
dfcomp['Functional'][2216]
fillna()
, nothing happens. Even after running the fillna statement I can rerun the first statement and see the same 2 nan instances.dfcomp['Functional']=dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode())
I have tried both versions btw
dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode(),inplace=True)
replace()
function for this but no luckdfcomp['Functional']=dfcomp['Functional'].replace({'nan':dfcomp['Functional'].mode()})
Is there something wrong with my code? why is fillna()
not recognizing the nan
when isnull()
can do so?
Also, why is the index search showing the value as nan
but when I try to replace the same value using replace()
there is no result?
How can I replace the nan values when my fillna()
is not able to recognize it?
Upvotes: 6
Views: 16852
Reputation: 2019
Essentially the problem is the return type of dfcomp['Functional'].mode()
This a single element pandas.Series
and the fillna()
expects either a scalar or a dict/Series/DataFrame of the same len as the column you are trying to fill.
You need to calculate the mode of the column and then pass the scalar to the fillna()
method.
mode = dfcomp['Functional'].mode().values[0]
dfcomp['Functional'].fillna(value=mode, inplace=True)
Upvotes: 4
Reputation: 59519
This is an Index
alignment problem. pd.Series.mode
always returns Series even if only one value is returned. The index of this Series is thus a RangeIndex
(up to the number of values tied for the mode) and so when you use .fillna
it tries to align on Index, which mostly doesn't align with your DataFrame.
You want to select the modal value so use .iloc
dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['Functional'].mode().iloc[0])
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame({'foo': np.random.choice([1,2,3,np.NaN], 7)})
df['foo'].mode()
#0 3.0
#dtype: float64
# Nothing gets filled because only the row with Index 0 could possibly
# be filled and it wasn't missing to begin with
df['foo'].fillna(df['foo'].mode())
#0 3.0
#1 NaN
#2 1.0
#3 3.0
#4 3.0
#5 NaN
#6 1.0
#Name: foo, dtype: float64
# This fills the `NaN` with 3 regardless of index
df['foo'].fillna(df['foo'].mode().iloc[0])
#0 3.0
#1 3.0
#2 1.0
#3 3.0
#4 3.0
#5 3.0
#6 1.0
#Name: foo, dtype: float64
Upvotes: 2
Reputation: 1631
In order to fill NaN values, you can use the following code:
dfcomp = dfcomp.fillna(value=0)
Later update:
dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['mode'])
Upvotes: -1