GavinMG
GavinMG

Reputation: 21

Removing time series outliers in R with tsclean() - adjusting the sensibility to outliers

I have a time series object in R that contains data from sensors that measure bike traffic patterns. I would like to remove outliers that represent measurment errors. The measurment errors are easily identified with visual inspection. The data has both annual and weekly seasonality, which seems to confound the process used in the tsclean() function from the forecast package, which is designed for this task.

I do not want to do this manually, because I will be automating the process for about 30 time series objects.

How can I adjust the tsclean() function so it is less sensitive to weekly cycles?

Please help! Thanks.

I am using the following code to remove outliers and plot the results for visual inspection. I apologize, the ts object is way to large to generate an example here.

1. No transformation

library(forecast)

no_outlier <-  tsclean(ts_Rachel_HoteldeVille, lambda = NULL)
plot(ts_Rachel_HoteldeVille, col='black', lwd=2)
lines(no_outlier, col = "red", lwd=2)
title(main = "No transformation")

enter image description here

The outliers are represented by the black lines. There should only be about 9 data points being removed. The cleaned data is in red. It is too sensitive and is removing data points that represent normal weekly variation

2. Box Cox applied

no_outlier_boxcox <-  tsclean(ts_Rachel_HoteldeVille, lambda = "auto")
plot(ts_Rachel_HoteldeVille, col='black', lwd=2)
lines(no_outlier, col = "orange", lwd=2)
title(main = "Box Cox applied")

enter image description here

I also tryed with the Box-Cox transformation applied, which offers somewhat better results, but not perfect. In this case, the orange lines represent the "cleaned" time series. It is still cutting out certain non-outlier data points, but it actually kept an obvious outlier.

Upvotes: 2

Views: 326

Answers (1)

Rob Hyndman
Rob Hyndman

Reputation: 31820

tsclean() fits an MSTL model to the time series, and then removes the seasonal component. Next it fits a "super smoother" to model the trend, which is removed from the seasonally adjusted data. Finally, it identifies outliers in the remainder series using the same threshold as "far-out" values in Tukey's original boxplot. See https://robjhyndman.com/hyndsight/tsoutliers/ for the details.

If it isn't working for your data, you can do something similar, but find a way of modelling the seasonality and/or trend that is better suited to your application. The key idea is to model the signal in some way, remove it, and find the outliers in what's left.

Upvotes: 1

Related Questions