Vladimir Emelianov
Vladimir Emelianov

Reputation: 91

How to simplify an IF statement

So I have an IF statement in python which essentially looks to change null values in a dataset to an average based off two other columns.

def impute_age(cols):
    Age = cols[0]
    Pclass = cols[1]
    Sex = cols[2]

    if pd.isnull(Age):
        if Pclass == 1 and Sex == 0:
            return train.loc[(train["Pclass"] == 1) 
                         & (train["Sex_male"] == 0)]["Age"].mean() 
        if Pclass == 2 and Sex == 0:
            return train.loc[(train["Pclass"] == 2) 
                         & (train["Sex_male"] == 0)]["Age"].mean()
        if Pclass == 3 and Sex == 0:
            return train.loc[(train["Pclass"] == 3) 
                         & (train["Sex_male"] == 0)]["Age"].mean()
        if Pclass == 1 and Sex == 1:
            return train.loc[(train["Pclass"] == 1) 
                         & (train["Sex_male"] == 1)]["Age"].mean()
        if Pclass == 2 and Sex == 1:
            return train.loc[(train["Pclass"] == 2) 
                         & (train["Sex_male"] == 1)]["Age"].mean()
        if Pclass == 3 and Sex == 1:
            return train.loc[(train["Pclass"] == 3) 
                         & (train["Sex_male"] == 1)]["Age"].mean()
    else:
        return Age

So here i'm trying to fill in nans using the average age of male/females in certain passenger classes. I feel like there would be a much better way of writing this, especially if I was to come across a much bigger dataset. For reference the train df is the main df with all of the data. For some reason I couldn't get this code to work with a subset of train passed through using the cols argument.

The question here is essentially: how can I write this in a much simpler way & is there a way I could write this IF statement if my dataset was MUCH larger?

Upvotes: 1

Views: 148

Answers (2)

Umair Mohammad
Umair Mohammad

Reputation: 4635

PCLASS_VALUES = [
[],
]

SEX_VALUES = [
[],
]

return train.loc[(train["Pclass"] == PCLASS_VALUES[Pclass][Sex]) & (train["Sex_male"] == SEX_VALUES[Pclass][Sex])]["Age"].mean() 

Upvotes: 0

Prune
Prune

Reputation: 77827

It appears to me that all you need to do is parameterize your inner if:

if pd.isnull(Age):
    return train.loc[(train["Pclass"] == Pclass) 
                   & (train["Sex_male"] == Sex)]["Age"].mean() 

Upvotes: 9

Related Questions