Reputation: 33
I am trying to create a function that returns the remainder of a dataframe with the outlier removed for a specific column but the dataframe object that is returned is always empty no matter what column i use.
remove_outlier = function(dataframe,column){
average = mean(dataframe[[column]])
std = sd(dataframe[[column]])
cutoff = 3 * std
lower = average - cutoff
upper = average + cutoff
print(lower)
new = dataframe[dataframe[[column]] > lower & dataframe[[column]] < lower]
return(new)
}
testing = remove_outlier(BostonHousing,'age')
head(testing)
Upvotes: 1
Views: 72
Reputation: 21294
new = dataframe[dataframe[[column]] > lower & dataframe[[column]] < lower]
Since there's no equal sign there's no possible way to be greater than a value and lower but not equal at the same time. This line is incorrect, I suspect you intended to have upper there instead.
new = dataframe[dataframe[[column]] > lower & dataframe[[column]] < upper,]
EDIT: add a comma, thanks to u/maydin for the catch.
Upvotes: 3