Atif Rizwan
Atif Rizwan

Reputation: 685

How to add Anomalies in dataset

I want to detect anomalies from continuous data set. The dataset is generated from sklearn.datasets.samples_generator

here is the code to generate dataset

from sklearn.datasets.samples_generator import make_blobs
(X,y) =  make_blobs(n_samples=100,n_features=5,centers=3,cluster_std=1.3,random_state=40)

Now, I want to add anomalies in that dataset and then I will detect these anomalies. I can detect anomalies I have code for that but first I need anomalies in that dataset.

Upvotes: 0

Views: 762

Answers (1)

Ehsan
Ehsan

Reputation: 47

For all I know, there is no function in sklearn api that generates outliers.

But make_blob also accepts parameters in a more detailed manner in which you can specify number of samples for each cluster, "centers & standard deviation" for each feature. (make_blob uses a Gaussian distribution for generating datasets.)

The solution is to generate data in two steps, once for actual data & once for anomalies using different centers & standard deviation.

X, y = make_blobs(n_samples=sample_list, centers=center_list, cluster_std=diviation_list,n_features=2,random_state=0)

In the above code, specify sample_list as an array of shape(1,#NumberOfClusters), center_list & diviation_list as an array of shape (#NumberofClusters,#NumberOfFeatures).

Upvotes: 1

Related Questions