Reputation: 33
I have following data frame of datetime stamp and values.
date Value
2022-07-19 44.43000000
2022-07-20 43.43000000
2022-07-21 42.43000000
2022-07-22 41.43000000
2022-07-25 41.43000000
... ...
2022-09-02 86.40000000
2022-09-06 85.13000000
2022-09-07 86.86000000
2022-09-08 88.44000000
2022-09-09 89.44000000
What would be efficient way to code in python to get this outlier?
Upvotes: 2
Views: 140
Reputation: 1057
We have similar use-case, our data was time series data(seasonal) with uniform timestamps and no missing data.
If we found missing data in specific timeframe we will increase the timeframe window.We use different timeframe(uniform) for example-1hr,2hr,4hr,8hr,16hr,24hr
to build Time series Anomaly Detection Model.
TSIF(Time Series Isolation Forest) is able to detect anomaly at top, bottom and in between as well. Results are so good. We added additional features such as day of week,hour of day
.
As your data is in daily timeframe mode,hour of day
feature cannot be applied.
Contamination Factor-
contamination factor can be static as well as dynamic bases on use-case
.
Higher the contamination factor ,higher anomalies get detected and vice-versa.
Minimum 6 Months Data will give the best possible results(if data is cyclic and seasonal) and exact anomalies as model will have good data to train
and detect anomalies
.
If possible you can share 6 Months data, will share results with best contamination value
for daily timeframe.
You can find TSIF Model on Youtube and complete Code with Dataset in below links.
https://www.youtube.com/watch?v=hkXPdkPfgoo
Upvotes: 1