Reputation: 764
I have a column Age in dataframe which has NAN, I am trying to change this to mean based on certain group by condition and not sure why it breaks. I have basically recreated a script to reproduce with below code, if you see output Post apply function. I am stills seeing null values in there
import pandas as pd
import numpy as np
def find_mean_age(Sex,Typ):
return temp.loc[(temp['Sex']==Sex)&(temp['Typ']==Typ),'Age']
if __name__ =='__main__':
df = pd.DataFrame({'Id':[1,2,3,4,5,6],
'Sex':['male','male','female','male','female','female'],
'Age':[21,float('Nan'),float('Nan'),23,56,32],
'Typ':['A','A','V','V','V','V']})
print(df)
temp = df.loc[(df['Age'].notnull())&(df['Age'] < 65 ),
['Age','Sex','Typ']].groupby(['Sex','Typ'],as_index=False).mean()
df.loc[df['Age'].isnull(), ['Age']] = df.apply(lambda row: find_mean_age(row['Sex'], row['Typ'][0]),
axis=1)
print(df)
Output
Id Sex Age Typ
0 1 male 21.0 A
1 2 male NaN A
2 3 female NaN V
3 4 male 23.0 V
4 5 female 56.0 V
5 6 female 32.0 V
Upvotes: 1
Views: 441
Reputation: 451
Your function returns Series object, not value. It is easy to fix your code:
def find_mean_age(Sex, Typ):
return temp.loc[(temp['Sex'] == Sex) & (temp['Typ'] == Typ)]['Age'].tolist()[0]
This produces:
Id Sex Age Typ
0 1 male 21.0 A
1 2 male 21.0 A
2 3 female 44.0 V
3 4 male 23.0 V
4 5 female 56.0 V
5 6 female 32.0 V
It should be mentioned that @chris-a provided the most elegant solution of your problem.
Upvotes: 1