Qasim Khan
Qasim Khan

Reputation: 154

Trying to apply a function on a Pandas DataFrame in Python

I'm trying to apply this function to fill the Age column based on Pclass and Sex columns. But I'm unable to do so. How can I make it work?

def fill_age():
    Age = train['Age']
    Pclass = train['Pclass']
    Sex = train['Sex']

    if pd.isnull(Age):
        if Pclass == 1:
            return 34.61
        elif (Pclass == 1) and (Sex == 'male'):
            return 41.2813 
        elif (Pclass == 2) and (Sex == 'female'):
            return 28.72
        elif (Pclass == 2) and (Sex == 'male'):
            return 30.74
        elif (Pclass == 3) and (Sex == 'female'):
            return 21.75 
        elif (Pclass == 3) and (Sex == 'male'):
            return 26.51 
        else:
            pass
    else:
        return Age 


train['Age'] = train['Age'].apply(fill_age(),axis=1)

I'm getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 0

Views: 580

Answers (1)

Celius Stingher
Celius Stingher

Reputation: 18367

You should consider using parenthesis to separate the arguments (which you already did) and change the boolean operator and for bitwise opeator & to avoid this type of errors. Also, keep in mind that if you want to use apply then you should use a parameter x for the function which will part of a lambda in the apply function:

def fill_age(x):
    Age = x['Age']
    Pclass = x['Pclass']
    Sex = x['Sex']

    if pd.isnull(Age):
        if Pclass == 1:
            return 34.61
        elif (Pclass == 1) & (Sex == 'male'):
            return 41.2813 
        elif (Pclass == 2) & (Sex == 'female'):
            return 28.72
        elif (Pclass == 2) & (Sex == 'male'):
            return 30.74
        elif (Pclass == 3) & (Sex == 'female'):
            return 21.75 
        elif (Pclass == 3) & (Sex == 'male'):
            return 26.51 
        else:
            pass
    else:
        return Age 

Now, using apply with the lambda:

train['Age'] = train['Age'].apply(lambda x: fill_age(x),axis=1)

In a sample dataframe:

df = pd.DataFrame({'Age':[1,np.nan,3,np.nan,5,6],
                   'Pclass':[1,2,3,3,2,1],
                   'Sex':['male','female','male','female','male','female']})

Using the answer provided above:

df['Age'] = df.apply(lambda x: fill_age(x),axis=1)

Output:

    Age  Pclass     Sex
0   1.00       1    male
1  28.72       2  female
2   3.00       3    male
3  21.75       3  female
4   5.00       2    male
5   6.00       1  female

Upvotes: 1

Related Questions