Reputation: 121
I'm completely new to Python (and this website) and am currently trying to replace NA values in specific dataframe columns with their mode. I've tried various methods which are not working. Please help me spot what I'm doing incorrectly:
Note: All the columns I'm working with are float64
types. All my codes run but when I check the null amount with df[cols_mode].isnull().sum()
in the columns, it remains the same.
Method 1:
cols_mode = ['race', 'goal', 'date', 'go_out', 'career_c']
df[cols_mode].apply(lambda x: x.fillna(x.mode, inplace=True))
I tried the Imputer method too but encountered the same result
Method 2:
for column in df[['race', 'goal', 'date', 'go_out', 'career_c']]:
mode = df[column].mode()
df[column] = df[column].fillna(mode)
Method 3:
df['race'].fillna(df.race.mode(), inplace=True)
df['goal'].fillna(df.goal.mode(), inplace=True)
df['date'].fillna(df.date.mode(), inplace=True)
df['go_out'].fillna(df.go_out.mode(), inplace=True)
df['career_c'].fillna(df.career_c.mode(), inplace=True)
Method 4: My methods become more and more of a manual process and finally this one works:
df['race'].fillna(2.0, inplace=True)
df['goal'].fillna(1.0, inplace=True)
df['date'].fillna(6.0, inplace=True)
df['go_out'].fillna(2.0, inplace=True)
df['career_c'].fillna(2.0, inplace=True)
Upvotes: 12
Views: 52275
Reputation: 1
For a single column imputation
df['col'] = df['col'].fillna(df['col'].mode()[0])
if you want to apply the same to a list of columns then loop over it.
Upvotes: 0
Reputation: 51
Why not use a dictionary for your columns and pass that through instead?
dic = {'race': 2.0, 'goal': 1.0, 'date': 6.0, 'go_out': 2.0, 'career_c': 2.0}
df.fillna(value=dic)
Upvotes: 0
Reputation: 1
Alternatively I used another data frame only containing the Modes of the columns, however you need to make sure that NaN is not the Mode of any of the columns
#Create the Mode Data frame
df_mode=df.mode()
#simply using a forloop with object
for x in df.columns.values:
df[x]=df[x].fillna(value=df_mode[x].iloc[0])
You can also use in place method. This was useful while working in large data sets I had simply created a data frame with all mean mode median for all the columns.
Upvotes: 0
Reputation: 3872
mode
returns a Series, so you still need to access the row you want before replacing NaN
values in your DataFrame.
for column in ['race', 'goal', 'date', 'go_out', 'career_c']:
df[column].fillna(df[column].mode()[0], inplace=True)
If you want to apply it to all the columns of the DataFrame, then:
for column in df.columns:
df[column].fillna(df[column].mode()[0], inplace=True)
Upvotes: 28