Remove outliers based on a preceding value

Question

How to remove outliers using a criterion that a value cannot be more than 2-fold higher then its preceding one.

Here is my try:

x<-c(1,2,6,4,10,20,50,10,2,1)

remove_outliers <- function(x, na.rm = TRUE, ...) {
  for(i in 1:length(x))
  x < (x[i-1] + 2*x)
  x
}

remove_outliers(y)

expected outcome: 1,2,4,10,20,2,1

Thanks!

Pierre Lapointe · Accepted Answer

I think the first 10 should be removed in your data because 10>2*4. Here's a way to do what you want without loops. I'm using the dplyr version of lag.

library(dplyr)
x<-c(1,2,6,4,10,20,50,10,2,1)
x[c(TRUE,na.omit(x<=dplyr::lag(x)*2))]
[1]  1  2  4 20 10  2  1

EDIT

To use this with a data.frame:

df <- data.frame(id=1:10, x=c(1,2,6,4,10,20,50,10,2,1))
df[c(TRUE,na.omit(df$x<=dplyr::lag(df$x,1)*2)),]

   id  x
1   1  1
2   2  2
4   4  4
6   6 20
8   8 10
9   9  2
10 10  1

Remove outliers based on a preceding value

Answers (2)

Related Questions