Infinite
Infinite

Reputation: 764

Correct usage of apply function to remove NAN

I have a column Age in dataframe which has NAN, I am trying to change this to mean based on certain group by condition and not sure why it breaks. I have basically recreated a script to reproduce with below code, if you see output Post apply function. I am stills seeing null values in there

import pandas as pd
import numpy as np



def find_mean_age(Sex,Typ):
    return temp.loc[(temp['Sex']==Sex)&(temp['Typ']==Typ),'Age']

if __name__ =='__main__':
    df = pd.DataFrame({'Id':[1,2,3,4,5,6],
                   'Sex':['male','male','female','male','female','female'],
                   'Age':[21,float('Nan'),float('Nan'),23,56,32],
                   'Typ':['A','A','V','V','V','V']})
    print(df)

    temp = df.loc[(df['Age'].notnull())&(df['Age'] < 65 ),
                ['Age','Sex','Typ']].groupby(['Sex','Typ'],as_index=False).mean()

    df.loc[df['Age'].isnull(), ['Age']] = df.apply(lambda row: find_mean_age(row['Sex'], row['Typ'][0]),
                                                   axis=1)

    print(df)

Output

   Id     Sex   Age Typ
0   1    male  21.0   A
1   2    male   NaN   A
2   3  female   NaN   V
3   4    male  23.0   V
4   5  female  56.0   V
5   6  female  32.0   V

Upvotes: 1

Views: 441

Answers (1)

Roman
Roman

Reputation: 451

Your function returns Series object, not value. It is easy to fix your code:

def find_mean_age(Sex, Typ):
    return temp.loc[(temp['Sex'] == Sex) & (temp['Typ'] == Typ)]['Age'].tolist()[0]

This produces:

   Id     Sex   Age Typ
0   1    male  21.0   A
1   2    male  21.0   A
2   3  female  44.0   V
3   4    male  23.0   V
4   5  female  56.0   V
5   6  female  32.0   V

It should be mentioned that @chris-a provided the most elegant solution of your problem.

Upvotes: 1

Related Questions