Reputation: 41
I know five standard unsupervised outlier detection approaches :
But what type is Isolation Forest? According to google it is a ''tree-based model'', but is it also possible to say that isolation forest is a Distance-based model?
Upvotes: 0
Views: 531
Reputation: 719
While the Anomaly detection's wikipedia page states that it is a Density-based technique
, you should instead refer to the original paper, and Scikit-learn documentation.
Isolation forest
is indeed useful for anomaly detection, and is particularly efficient for large datasets. It is represented by a tree structure, and given that it uses recursive partitioning
,
the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.
In addition to all the details provided in the Scikit-learn docs, you can read in the source paper:
Apart from the key difference of isolation versus profiling, iForest is distinguished from existing model-based, distance-based and density-based methods in the following ways:
- The isolation characteristic of iTrees enables them to build partial models and exploit sub-sampling to an extent that is not feasible in existing methods. Since a large part of an iTree that isolates normal points is not needed for anomaly detection; it does not need to be constructed. A small sample size produces better iTrees because the swamping and masking effects are reduced.
- iForest utilizes no distance or density measures to detect anomalies. This eliminates major computational cost of distance calculation in all distance-based methods and density-based methods.
- iForest has a linear time complexity with a low constant and a low memory requirement. To our best knowledge, the best-performing existing method achieves only approximate linear time complexity with high memory usage.
- iForest has the capacity to scale up to handle extremely large data size and high-dimensional problems with a large number of irrelevant attributes.
Upvotes: 2