PVL
PVL

Reputation: 61

Why fillna with mode isn't replacing nan values in the dataframe?

I am trying to replace the nan values in a dataframe column 'Functional' using fillna() function. The issues I am facing are below:

  1. I am able to detect the null values using the isnull()

dfcomp[dfcomp['Functional'].isnull()==True]

search for null values

  1. using above index I searched the actual value

dfcomp['Functional'][2216]

value search using the index

  1. but when I try to fill the nan using fillna(), nothing happens. Even after running the fillna statement I can rerun the first statement and see the same 2 nan instances.

dfcomp['Functional']=dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode())

I have tried both versions btw

dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode(),inplace=True)

The fillna()

  1. I also tried using the replace() function for this but no luck

dfcomp['Functional']=dfcomp['Functional'].replace({'nan':dfcomp['Functional'].mode()})

Is there something wrong with my code? why is fillna() not recognizing the nan when isnull() can do so? Also, why is the index search showing the value as nan but when I try to replace the same value using replace() there is no result?

How can I replace the nan values when my fillna() is not able to recognize it?

Upvotes: 6

Views: 16852

Answers (3)

nickyfot
nickyfot

Reputation: 2019

Essentially the problem is the return type of dfcomp['Functional'].mode() This a single element pandas.Series and the fillna() expects either a scalar or a dict/Series/DataFrame of the same len as the column you are trying to fill.

You need to calculate the mode of the column and then pass the scalar to the fillna() method.

mode = dfcomp['Functional'].mode().values[0]
dfcomp['Functional'].fillna(value=mode, inplace=True)

Upvotes: 4

ALollz
ALollz

Reputation: 59519

This is an Index alignment problem. pd.Series.mode always returns Series even if only one value is returned. The index of this Series is thus a RangeIndex (up to the number of values tied for the mode) and so when you use .fillna it tries to align on Index, which mostly doesn't align with your DataFrame.

You want to select the modal value so use .iloc

dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['Functional'].mode().iloc[0])

MCVE

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({'foo': np.random.choice([1,2,3,np.NaN], 7)})

df['foo'].mode()
#0    3.0
#dtype: float64

# Nothing gets filled because only the row with Index 0 could possibly
# be filled and it wasn't missing to begin with
df['foo'].fillna(df['foo'].mode())
#0    3.0
#1    NaN
#2    1.0
#3    3.0
#4    3.0
#5    NaN
#6    1.0
#Name: foo, dtype: float64

# This fills the `NaN` with 3 regardless of index
df['foo'].fillna(df['foo'].mode().iloc[0])
#0    3.0
#1    3.0
#2    1.0
#3    3.0
#4    3.0
#5    3.0
#6    1.0
#Name: foo, dtype: float64

Upvotes: 2

Adrian B
Adrian B

Reputation: 1631

In order to fill NaN values, you can use the following code:

dfcomp = dfcomp.fillna(value=0)

Later update:

dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['mode'])

Upvotes: -1

Related Questions