manohar864
manohar864

Reputation: 47

Is there any simpler method in pandas to replace null values other than loop in this case?

I'm trying to replace the null values with the mode of numbers in each group(groupby area). Is there any simpler method in Pandas to do this?

I've done it through loop method which is not recommended for very large datasets.

data = {'area':['abc', 'abc', 'abc', 'abc','bcd', 'bcd', 'bcd' , 
'cde','cde', 'cde', 'cde', 'cde'], 'number':[1, 2, 2, np.nan, 3, 3, 
np.nan, 5, 5, 4, np.nan, np.nan]}
df = pd.DataFrame(data)

Hell = pd.DataFrame(df.groupby("area")['number'].agg(lambda 
x:x.value_counts().index[0])).reset_index()

import math
for index in range(0,df.shape[0]):   
    if math.isnan(df['number'][index]):
        for i in range(0,Hell.shape[0]):
            if Hell['area'][i] == df['area'][index]:
                df['number'][index] = Hell['number'][i]

Upvotes: 1

Views: 197

Answers (1)

jezrael
jezrael

Reputation: 863166

Use Series.fillna with Series.map for replace by matched values by area:

df['number'] = df['number'].fillna(df['area'].map(Hell.set_index('area')['number']))
print (df)
   area  number
0   abc     1.0
1   abc     2.0
2   abc     2.0
3   abc     2.0
4   bcd     3.0
5   bcd     3.0
6   bcd     3.0
7   cde     5.0
8   cde     5.0
9   cde     4.0
10  cde     5.0
11  cde     5.0

Or use your solution with GroupBy.transform for same Series like original filled by aggregate values:

s = df.groupby("area")['number'].transform(lambda x: x.value_counts().index[0])
#alternative
#s = df.groupby("area")['number'].transform(lambda x: x.mode().iat[0])
df['number'] = df['number'].fillna(s)
print (df)
   area  number
0   abc     1.0
1   abc     2.0
2   abc     2.0
3   abc     2.0
4   bcd     3.0
5   bcd     3.0
6   bcd     3.0
7   cde     5.0
8   cde     5.0
9   cde     4.0
10  cde     5.0
11  cde     5.0

Upvotes: 2

Related Questions