Reputation: 31
I'm writing a function to remove outliers. When I print the results it does what I want but I cant get it to return them. What am I doing wrong? I have a similar code set up that returns the results
When I Print:
NA_Outliers = function(x){
Q1 <- quantile(x, probs=.25)
Q3 <- quantile(x, probs=.75)
iqr = Q3-Q1
upper_limit = Q3 + (iqr*1.5)
lower_limit = Q1 - (iqr*1.5)
for (value in x) {
if (value> upper_limit | value < lower_limit){
value = NA
print(value)
}else{
print(value)
}
}
}
Results:
> NA_Outliers(Test)
[1] NA
[1] 2.428675
[1] 2.428384
[1] 2.714187
[1] 2.457054
[1] 2.464337
[1] 2.686667
[1] 2.166072
[1] 2.690987
[1] 2.632692
[1] NA
[1] 2.84985
When I Return:
NA_Outliers = function(x){
Q1 <- quantile(x, probs=.25)
Q3 <- quantile(x, probs=.75)
iqr = Q3-Q1
upper_limit = Q3 + (iqr*1.5)
lower_limit = Q1 - (iqr*1.5)
for (value in x) {
if (value> upper_limit | value < lower_limit){
value = NA
return(value)
}else{
return(value)
}
}
}
Results:
[1] NA
When I set it up this way, it just returns the first value in the column.
NA_Outliers = function(x){
Q1 <- quantile(x, probs=.25)
Q3 <- quantile(x, probs=.75)
iqr = Q3-Q1
upper_limit = Q3 + (iqr*1.5)
lower_limit = Q1 - (iqr*1.5)
for (value in x) {
if(class(value) == "numeric"){
value[value> upper_limit | value < lower_limit] = NA
return(value)
}else{
return(value)
}
}
}
Upvotes: 0
Views: 402
Reputation: 160687
If you want to return the values that are not considered outliers, then you can just subset them and return the new vector.
NA_outliers <- function(x, fac = 1.5, probs = c(0.25, 0.75), na.rm = FALSE) {
quants <- quantile(x, probs = probs, na.rm = na.rm)
iqr <- diff(quants)
out <- x[ (quants[1] - fac*iqr) <= x & x <= (quants[2] + fac*iqr) ]
return(out)
}
NA_outliers(c(1:10, 100))
# [1] 1 2 3 4 5 6 7 8 9 10
Or if you'd prefer it returns NA
for the outliers, then
NA_outliers <- function(x, fac = 1.5, probs = c(0.25, 0.75), na.rm = FALSE) {
quants <- quantile(x, probs = probs, na.rm = na.rm)
iqr <- diff(quants)
x[ (quants[1] - fac*iqr) > x | x > (quants[2] + fac*iqr) ] <- NA
x
}
NA_outliers(c(1:10, 100))
# [1] 1 2 3 4 5 6 7 8 9 10 NA
Upvotes: 1