HKlein
HKlein

Reputation: 31

Removing Outliers in loop - function will print but not return

I'm writing a function to remove outliers. When I print the results it does what I want but I cant get it to return them. What am I doing wrong? I have a similar code set up that returns the results

When I Print:

NA_Outliers = function(x){
  Q1 <- quantile(x, probs=.25)
  Q3 <- quantile(x, probs=.75)
  iqr = Q3-Q1

  upper_limit = Q3 + (iqr*1.5)
  lower_limit = Q1 - (iqr*1.5)
  for (value in x) {
    if (value> upper_limit | value < lower_limit){
      value = NA
      print(value)
    }else{
      print(value)
    }
  }
}

Results:

> NA_Outliers(Test)
[1] NA
[1] 2.428675
[1] 2.428384
[1] 2.714187
[1] 2.457054
[1] 2.464337
[1] 2.686667
[1] 2.166072
[1] 2.690987
[1] 2.632692
[1] NA
[1] 2.84985

When I Return:

NA_Outliers = function(x){
  Q1 <- quantile(x, probs=.25)
  Q3 <- quantile(x, probs=.75)
  iqr = Q3-Q1

  upper_limit = Q3 + (iqr*1.5)
  lower_limit = Q1 - (iqr*1.5)
  for (value in x) {
    if (value> upper_limit | value < lower_limit){
      value = NA
      return(value)
    }else{
      return(value)
    }
  }
}

Results:

[1] NA

When I set it up this way, it just returns the first value in the column.

NA_Outliers = function(x){
  Q1 <- quantile(x, probs=.25)
  Q3 <- quantile(x, probs=.75)
  iqr = Q3-Q1

  upper_limit = Q3 + (iqr*1.5)
  lower_limit = Q1 - (iqr*1.5)
  for (value in x) {
    if(class(value) == "numeric"){
      value[value> upper_limit | value < lower_limit] = NA
      return(value)
    }else{
      return(value)
    }
  }
}

Upvotes: 0

Views: 402

Answers (1)

r2evans
r2evans

Reputation: 160687

If you want to return the values that are not considered outliers, then you can just subset them and return the new vector.

NA_outliers <- function(x, fac = 1.5, probs = c(0.25, 0.75), na.rm = FALSE) {
  quants <- quantile(x, probs = probs, na.rm = na.rm)
  iqr <- diff(quants)
  out <- x[ (quants[1] - fac*iqr) <= x & x <= (quants[2] + fac*iqr) ]
  return(out)
}
NA_outliers(c(1:10, 100))
#  [1]  1  2  3  4  5  6  7  8  9 10

Or if you'd prefer it returns NA for the outliers, then

NA_outliers <- function(x, fac = 1.5, probs = c(0.25, 0.75), na.rm = FALSE) {
  quants <- quantile(x, probs = probs, na.rm = na.rm)
  iqr <- diff(quants)
  x[ (quants[1] - fac*iqr) > x | x > (quants[2] + fac*iqr) ] <- NA
  x
}
NA_outliers(c(1:10, 100))
#  [1]  1  2  3  4  5  6  7  8  9 10 NA

Upvotes: 1

Related Questions