Kobe-Wan Kenobi
Kobe-Wan Kenobi

Reputation: 3864

Anomaly detection - what to use

What system to use for Anomaly detection?

I see that systems like Mahout do not list anomaly detection, but problems like classification, clustering, recommendation...

Any recommendations as well as tutorials and code examples would be great, since I haven't done it before.

Upvotes: 2

Views: 1819

Answers (2)

Hajar Homayouni
Hajar Homayouni

Reputation: 590

There are three categories of outlier detection approaches, namely, supervised, semi-supervised, and unsupervised.

  • Supervised: Requires fully labeled training and testing datasets. An ordinary classifier is trained first and applied afterward.
  • Semi-supervised: Uses training and test datasets, whereas training data only consists of normal data without any outliers. A model of the normal class is learned and outliers can be detected afterward by deviating from that model.
  • Unsupervised: Does not require any labels; there is no distinction between a training and a test dataset Data is scored solely based on intrinsic properties of the dataset.

If you have unlabeled data the following unsupervised anomaly detection approaches can be used to detect abnormal data:

  1. Use Autoencoder that captures a feature representation of the features present in the data and flags as outliers data points that are not well explained using the new representation. Outlier score for a data point is calculated based on reconstruction error (i.e., squared distance between the original data and its projection) You can find implementations in H2O and Tensorflow
  2. Use a clustering method, such as Self Organizing Map (SOM) and k-prototypes to cluster your unlabeled data into multiple groups. You can detect external and internal outliers in the data. External outliers are defined as the records positioned at the smallest cluster. Internal outliers are defined as the records distantly positioned inside a cluster. You can find codes for SOM and k-prototypes.

If you have labeled data, there are plenty of supervised classification approaches that you can try to detect outliers. Examples are Neural Networks, Decision Tree, and SVM.

Upvotes: 3

pvnguyen
pvnguyen

Reputation: 116

There is an anomaly detection implementation in scikit-learn, which is based on One-class SVM. You can also check out the ELKI project which has spatial outlier detection implemented.

In addition to "anomaly detection", you can also expand your search with "outlier detection", "fraud detection", "intrusion detection" to get some more results.

Upvotes: 6

Related Questions