Reputation: 938
Hence, the following derives: What isa clear the definition of Real-time Anomaly Detection?
I am investigating the field of Anomaly Detection and in many papers the approach is defined Real-time, while in many other it is simply called Anomaly Detection.
I happened to discovery, correct me whether I am wrong, that most of the so called real-time approaches are instead something like near-real-time. Specifically, they are some sort of unsupervised context-based anomaly detection on time series, where the context is almost always the bucket size. In other words, the algorithms processes micro-batches of data, hence from here follows the near-real-time.
Now, I was wondering whether there is a difference between the two kind of anomaly detection. If so, how they differ from each other and what is the threshold in the bucket size (if there is one)?
This set of questions come from the fact that I am conducting a study on performance/quality of prediction of different frameworks for Anomaly Detection and I was wondering whether this difference is substantial, since it implies two different evaluation metrics. I would like to read some certified sources regarding this matter.
Upvotes: 7
Views: 718
Reputation: 397
In fact, in many blogs or papers, they also explicitly mention that
Therefore, rather than calling these methods "real time anomaly detection", I prefer to call them "anomaly detection for streaming data". If your method is very fast (for example it takes 0.00001s) for detecting anomalies in streaming data, it can be called "real-time anomaly detection".
So the difference between "real time anomaly detection" and "anomaly detection" is that: "real time anomaly detection" is included by "anomaly detection"; On this basis, "real time anomaly detection" is dedicated to detect anomalies in streaming data and it occurs very fast.
Upvotes: 0
Reputation: 359
In computer graphics real-time processing means fast enough to appear as motion. In practise this means processing an image at a minimum of 24 fps.
Applying this meaning to real-time anomaly detection we have the example of a live video feed. Where we must process the anomaly detection algorithm fast enough to keep up with the video feed. In this case the anomaly detection must be completed in under ~40ms per frame.
This would restraint has a drastic change on the trade off of quality of the anomaly detection vs speed of processing.
A quick google throws up some literature on this trade off. https://www.researchgate.net/publication/224258100_Real-time_camera_anomaly_detection_for_real-world_video_surveillance
Upvotes: 1
Reputation: 2229
From my perspective it all comes down to the definition of "Real Time".
As a control engineer, the real time definition I live with, is just something that is fast enough to process recieved data before the next sample arrives. This means that if you know the sample rate, you know much time you have to do your processing of sensor data.
In control theory it really does not matter how long the processing algorithms memory is. aka. Bucket Size, of sensor buffer length. The chosen sample rate, and responsiveness of the controller all depends on the controlled process's dynamics.
So real time for something like a household radiator controller could be one sample per minute, which means that you can process a very long history of samples. In principle you could train a neural network on the last two years of data, and then let that do the anomoly detection, once for every recieved sample.
If this is a radar, where data is comming in at nanoseconds samplerates, you probably won't have time do more than apply a threshold value..
Anomoly Detection as a field of theory is in it self independent of how much time is required to process a timeseries, so as i see it, the difference is the Venn diagram of the real time process requirements, and the time spent for anomoly detection given any anomoly detection algorithm.
So it is a subset of anomoly detection algorithm, where the size of the subset is given by the releationship between realtime requirements, and processing powers.
Upvotes: 0
Reputation: 451
Interestingly, I've recently thought of some similar topics for a hobby project and found some interesting blogs by Crunchmetrics, a company specializing in ML based anomaly detection. The gist:
Real time - there is a training or baseline dataset which a system can reference. The reference "lookup" is fast to appear as real-time, if optimized of course.
Near Real Time - has no existing training or statistical models and the system must compute baselines, data frames or ranges as it goes thus impacting the speed of decision making.
One blog I found useful... (I have no relationship with this company): anomaly blog post
Upvotes: 4