Reputation: 685
I want to detect anomalies from continuous data set. The dataset is generated from sklearn.datasets.samples_generator
here is the code to generate dataset
from sklearn.datasets.samples_generator import make_blobs
(X,y) = make_blobs(n_samples=100,n_features=5,centers=3,cluster_std=1.3,random_state=40)
Now, I want to add anomalies in that dataset and then I will detect these anomalies. I can detect anomalies I have code for that but first I need anomalies in that dataset.
Upvotes: 0
Views: 762
Reputation: 47
For all I know, there is no function in sklearn api that generates outliers.
But make_blob
also accepts parameters in a more detailed manner in which you can specify number of samples for each cluster, "centers & standard deviation" for each feature. (make_blob
uses a Gaussian distribution for generating datasets.)
The solution is to generate data in two steps, once for actual data & once for anomalies using different centers & standard deviation.
X, y = make_blobs(n_samples=sample_list, centers=center_list, cluster_std=diviation_list,n_features=2,random_state=0)
In the above code, specify sample_list
as an array of shape(1,#NumberOfClusters), center_list
& diviation_list
as an array of shape (#NumberofClusters,#NumberOfFeatures).
Upvotes: 1