Reputation: 25
I'm using "outliers" package in order to remove some undesirable values. But it seems that rm.outliers() funcion does not replace all outliers at the same time. Probably, rm.outliers() could not perform despikes recursively. Then, basically I have to call this function a lot of times in order to replace all outliers. Here is a reproducible example of the issue I'm experiencing:
require(outliers)
# creating a timeseries:
set.seed(12345)
y = rnorm(10000)
# inserting some outliers:
y[4000:4500] = -11
y[4501:5000] = -10
y[5001:5100] = -9
y[5101:5200] = -8
y[5201:5300] = -7
y[5301:5400] = -6
y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")
Does anyone know how to improve the code above, so that all outliers could be replaced by a mean value?
Upvotes: 1
Views: 228
Reputation: 504
Best thought I could come up with is just to use a for
loop, keeping track of the outliers as you find them.
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
maxIter <- 100
outlierQ <- rep(F, length(y))
for (i in 1:maxIter) {
bad <- outlier(y, logical = T)
if (!any(bad)) break
outlierQ[bad] <- T
y[bad] <- mean(y[!bad])
}
y[outlierQ] <- mean(y[!outlierQ])
lines(y, col="blue")
Upvotes: 1