user3234112
user3234112

Reputation: 113

function to drop outliers

I created a function to drop my outliers. Here is the function

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    train = train.drop(drop_index,axis = 0)

and when I do

dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Nothing is being dropped.However when I manually execute the function. i.e get the index in the dataframe for this condition, I do get a valid index (943) and when I do

train = train.drop([943],axis = 0)

Then the row I want is being dropped correctly. I don't understand why the function wouldn't work as its supposed to be doing exactly what I am doing manually.

Upvotes: 0

Views: 98

Answers (1)

Bill the Lizard
Bill the Lizard

Reputation: 405995

At the end of dropping_outliers, it's assigning the result of drop to a local variable, not altering the dataframe passed in. Try this instead:

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    return train.drop(drop_index,axis = 0)

Then do the assignment when you call the function.

train = dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Also see python pandas dataframe, is it pass-by-value or pass-by-reference.

Upvotes: 1

Related Questions