Mithila Palkar
Mithila Palkar

Reputation: 33

Find outlies in timeseries date

I have following data frame of datetime stamp and values.

date        Value                
2022-07-19 44.43000000
2022-07-20 43.43000000
2022-07-21 42.43000000
2022-07-22 41.43000000
2022-07-25 41.43000000

... ...
2022-09-02  86.40000000
2022-09-06  85.13000000
2022-09-07  86.86000000
2022-09-08  88.44000000
2022-09-09  89.44000000

What would be efficient way to code in python to get this outlier?

Upvotes: 2

Views: 140

Answers (1)

Divyank
Divyank

Reputation: 1057

We have similar use-case, our data was time series data(seasonal) with uniform timestamps and no missing data. If we found missing data in specific timeframe we will increase the timeframe window.We use different timeframe(uniform) for example-1hr,2hr,4hr,8hr,16hr,24hr to build Time series Anomaly Detection Model.

TSIF(Time Series Isolation Forest) is able to detect anomaly at top, bottom and in between as well. Results are so good. We added additional features such as day of week,hour of day.

As your data is in daily timeframe mode,hour of day feature cannot be applied.

Contamination Factor- contamination factor can be static as well as dynamic bases on use-case.

Higher the contamination factor ,higher anomalies get detected and vice-versa. Minimum 6 Months Data will give the best possible results(if data is cyclic and seasonal) and exact anomalies as model will have good data to train and detect anomalies.

If possible you can share 6 Months data, will share results with best contamination value for daily timeframe.

You can find TSIF Model on Youtube and complete Code with Dataset in below links.

https://www.youtube.com/watch?v=hkXPdkPfgoo

https://github.com/srivatsan88/End-to-End-Time-Series/blob/master/Anomaly_Detection_using_Isolation_Forest_Feature_Engineering.ipynb

Upvotes: 1

Related Questions