Tom
Tom

Reputation: 25

How to get rid of multiple outliers in a timeseries in R?

I'm using "outliers" package in order to remove some undesirable values. But it seems that rm.outliers() funcion does not replace all outliers at the same time. Probably, rm.outliers() could not perform despikes recursively. Then, basically I have to call this function a lot of times in order to replace all outliers. Here is a reproducible example of the issue I'm experiencing:

require(outliers)
   # creating a timeseries:
   set.seed(12345)
   y = rnorm(10000)
   # inserting some outliers:
   y[4000:4500] = -11
   y[4501:5000] = -10
   y[5001:5100] = -9
   y[5101:5200] = -8
   y[5201:5300] = -7
   y[5301:5400] = -6
   y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")

Does anyone know how to improve the code above, so that all outliers could be replaced by a mean value?

Upvotes: 1

Views: 228

Answers (1)

user2802241
user2802241

Reputation: 504

Best thought I could come up with is just to use a for loop, keeping track of the outliers as you find them.

plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")

maxIter <- 100
outlierQ <- rep(F, length(y))

for (i in 1:maxIter) {
  bad <- outlier(y, logical = T)
  if (!any(bad)) break
  outlierQ[bad] <- T
  y[bad] <- mean(y[!bad])
}

y[outlierQ] <- mean(y[!outlierQ])

lines(y, col="blue")

Upvotes: 1

Related Questions