Reputation: 113
I created a function to drop my outliers. Here is the function
def dropping_outliers(train, condition):
drop_index = train[condition].index
#print(drop_index)
train = train.drop(drop_index,axis = 0)
and when I do
dropping_outliers(train, ((train.SalePrice<100000) & (train.LotFrontage>150)))
Nothing is being dropped.However when I manually execute the function. i.e get the index in the dataframe for this condition, I do get a valid index (943) and when I do
train = train.drop([943],axis = 0)
Then the row I want is being dropped correctly. I don't understand why the function wouldn't work as its supposed to be doing exactly what I am doing manually.
Upvotes: 0
Views: 98
Reputation: 405995
At the end of dropping_outliers
, it's assigning the result of drop
to a local variable, not altering the dataframe passed in. Try this instead:
def dropping_outliers(train, condition):
drop_index = train[condition].index
#print(drop_index)
return train.drop(drop_index,axis = 0)
Then do the assignment when you call the function.
train = dropping_outliers(train, ((train.SalePrice<100000) & (train.LotFrontage>150)))
Also see python pandas dataframe, is it pass-by-value or pass-by-reference.
Upvotes: 1