Reputation: 47
I'm trying to replace the null values with the mode of numbers in each group(groupby area). Is there any simpler method in Pandas to do this?
I've done it through loop method which is not recommended for very large datasets.
data = {'area':['abc', 'abc', 'abc', 'abc','bcd', 'bcd', 'bcd' ,
'cde','cde', 'cde', 'cde', 'cde'], 'number':[1, 2, 2, np.nan, 3, 3,
np.nan, 5, 5, 4, np.nan, np.nan]}
df = pd.DataFrame(data)
Hell = pd.DataFrame(df.groupby("area")['number'].agg(lambda
x:x.value_counts().index[0])).reset_index()
import math
for index in range(0,df.shape[0]):
if math.isnan(df['number'][index]):
for i in range(0,Hell.shape[0]):
if Hell['area'][i] == df['area'][index]:
df['number'][index] = Hell['number'][i]
Upvotes: 1
Views: 197
Reputation: 863166
Use Series.fillna
with Series.map
for replace by matched values by area
:
df['number'] = df['number'].fillna(df['area'].map(Hell.set_index('area')['number']))
print (df)
area number
0 abc 1.0
1 abc 2.0
2 abc 2.0
3 abc 2.0
4 bcd 3.0
5 bcd 3.0
6 bcd 3.0
7 cde 5.0
8 cde 5.0
9 cde 4.0
10 cde 5.0
11 cde 5.0
Or use your solution with GroupBy.transform
for same Series like original filled by aggregate values:
s = df.groupby("area")['number'].transform(lambda x: x.value_counts().index[0])
#alternative
#s = df.groupby("area")['number'].transform(lambda x: x.mode().iat[0])
df['number'] = df['number'].fillna(s)
print (df)
area number
0 abc 1.0
1 abc 2.0
2 abc 2.0
3 abc 2.0
4 bcd 3.0
5 bcd 3.0
6 bcd 3.0
7 cde 5.0
8 cde 5.0
9 cde 4.0
10 cde 5.0
11 cde 5.0
Upvotes: 2