Reputation: 375
I am trying to impute missing values using the most frequent value by a group using the pandas
module in Python. After checking some posts on Stack Overflow, I have managed to achieve that:
import numpy as np
import pandas as pd
df = pd.DataFrame({"group": ["A", "A", "A", "A", "B", "B", "B"],
"value": [1, 1, 1, np.nan, 2, np.nan, np.nan]})
df.groupby("group").transform(lambda x: x.fillna(x.mode().iloc[0]))
Running this code will fill with 1 the missing entry for group "A" and with 2 both missing entries for group "B". However, let's assume that one of the groups contains only missing data (in this case group "B"):
df1 = pd.DataFrame({"group": ["A", "A", "A", "A", "B", "B", "B"],
"value": [1, 1, 1, np.nan, np.nan, np.nan, np.nan]})
df1.groupby("group").transform(lambda x: x.fillna(x.mode().iloc[0]))
Running the code above will prompt an IndexError: single positional indexer is out-of-bounds
. I would expect the normal behaviour is to keep np.nan because if you run the method mode
for just, let's say, group "B" from df1
:
df1[df1.group == "B"].mode()
I will get that NaN is the most common value. How can I avoid this issue?
Upvotes: 2
Views: 2389
Reputation: 18306
Running the code above will prompt an IndexError: single positional indexer is out-of-bounds
This is because transform
gets to be passed each column as a series and at some point it will see the value
column on its own; and if you do:
df1[df1.group == "B"].value.mode()
you get
Series([], dtype: float64)
hence the index-out-of-bounds like error as it is empty and iloc[0]
doesn't exist.
OTOH, when you do:
df1[df1.group == "B"].mode()
mode
is calculated on a dataframe not a series and pandas decides to give a NaN on the full-NaN column i.e. value
column here.
So one remedy is to use apply
instead of transform
to pass a dataframe instead of individual series to your lambda
:
df1.groupby("group").apply(lambda x: x.fillna(x.mode().iloc[0])).reset_index(drop=True)
to get
group value
0 A 1.0
1 A 1.0
2 A 1.0
3 A 1.0
4 B NaN
5 B NaN
6 B NaN
Upvotes: 1