Reputation: 677
I want to detect the outliers in a "time series data" which contains the trend and seasonality components. I want to leave out the peaks which are seasonal and only consider only the other peaks and label them as outliers. As I am new to time series analysis, Please assist me to approach this time series problem.
The coding platform is using is Python.
I have trained my model and forecasted for the test data. Then being able to compute the difference between forecasted results with my actual values of test data then able to find out the outliers based on the variance observed.
!pip install pyramid-arima
from pyramid.arima import auto_arima
stepwise_model = auto_arima(train_log, start_p=1, start_q=1,max_p=3, max_q=3,m=7,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore', suppress_warnings=True,stepwise=True)
import math
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from sklearn.metrics import mean_squared_error
train, test = actual_vals[0:-70], actual_vals[-70:]
train_log, test_log = np.log10(train), np.log10(test)
history = [x for x in train_log]
predictions = list()
predict_log=list()
for t in range(len(test_log)):
stepwise_model.fit(history)
output = stepwise_model.predict(n_periods=1)
predict_log.append(output[0])
yhat = 10**output[0]
predictions.append(yhat)
obs = test_log[t]
history.append(obs)
figsize=(12, 7)
plt.figure(figsize=figsize)
pyplot.plot(test,label='Actuals')
pyplot.plot(predictions, color='red',label='Predicted')
pyplot.legend(loc='upper right')
pyplot.show()
But I can detect the outliers only in test data. Actually, I have to detect the outliers for the whole time series data including the train data I am having.
I have used the below code to split the original data into Seasonal, Trend, Residuals and can be seen in the below image.
from statsmodels.tsa.seasonal import seasonal_decompose
decomposed = seasonal_decompose()
Then am using the residual data to find out the outliers using boxplot since the seasonal and trend components were removed. Does this makes sense ?
Or is there any other simple or better approach to go with ?
Upvotes: 7
Views: 6277
Reputation: 186
You can:
"Attempt 2 : Using Seasonal Decomposition"
try to check for extreme points and that may lead you to some anomalies in the seasonal series.from scipy.signal import argrelextrema
x = np.array([2, 1, 2, 3, 2, 0, 1, 0])
argrelextrema(x, np.greater)
output:
(array([3, 6]),)
Some random data (My implementation of the above argrelextrema):
Upvotes: 2