Is there any simpler method in pandas to replace null values other than loop in this case?

Question

I'm trying to replace the null values with the mode of numbers in each group(groupby area). Is there any simpler method in Pandas to do this?

I've done it through loop method which is not recommended for very large datasets.

data = {'area':['abc', 'abc', 'abc', 'abc','bcd', 'bcd', 'bcd' , 
'cde','cde', 'cde', 'cde', 'cde'], 'number':[1, 2, 2, np.nan, 3, 3, 
np.nan, 5, 5, 4, np.nan, np.nan]}
df = pd.DataFrame(data)

Hell = pd.DataFrame(df.groupby("area")['number'].agg(lambda 
x:x.value_counts().index[0])).reset_index()

import math
for index in range(0,df.shape[0]):   
    if math.isnan(df['number'][index]):
        for i in range(0,Hell.shape[0]):
            if Hell['area'][i] == df['area'][index]:
                df['number'][index] = Hell['number'][i]

jezrael · Accepted Answer

Use Series.fillna with Series.map for replace by matched values by area:

df['number'] = df['number'].fillna(df['area'].map(Hell.set_index('area')['number']))
print (df)
   area  number
0   abc     1.0
1   abc     2.0
2   abc     2.0
3   abc     2.0
4   bcd     3.0
5   bcd     3.0
6   bcd     3.0
7   cde     5.0
8   cde     5.0
9   cde     4.0
10  cde     5.0
11  cde     5.0

Or use your solution with GroupBy.transform for same Series like original filled by aggregate values:

s = df.groupby("area")['number'].transform(lambda x: x.value_counts().index[0])
#alternative
#s = df.groupby("area")['number'].transform(lambda x: x.mode().iat[0])
df['number'] = df['number'].fillna(s)
print (df)
   area  number
0   abc     1.0
1   abc     2.0
2   abc     2.0
3   abc     2.0
4   bcd     3.0
5   bcd     3.0
6   bcd     3.0
7   cde     5.0
8   cde     5.0
9   cde     4.0
10  cde     5.0
11  cde     5.0

Is there any simpler method in pandas to replace null values other than loop in this case?

Answers (1)

Related Questions