Reputation: 804
Hi I need to cluster points which have values less than or equal to 0.1.My use case goes like this.
0 1649.500000
1 0.864556
2 0.944651
3 0.922754
4 0.829045
5 0.838665
6 1.323263
7 1.397340
8 1.560655
.. .......
27 1.315072
28 1.593657
29 1.222322
... .......
... .......
2890 0.151328
2891 0.149963
2892 0.149285
2893 0.146318
2894 0.147668
2895 0.141159
Here I need to cluster the below points. I have given the data as below in dbscan
X = X.reshape(-1,1)
db = DBSCAN(eps=0.1,min_samples=3,metric='manhattan',n_jobs=-1).fit(X)
labels = db.labels_
Now when I print the points which correspond to the points as below
for i in range(n_clusters_):
print("Cluster {0} include {1}".format(i,list(np.where(labels==i))))
My output is as follows:
Cluster 0 include [array([ 1, 2, 3, ..., 2893, 2894, 2895])]
If you can see the above data which I have provided 1st position has 0.8 ... and 2895th position has 0.141...But how can they be clustered when I have given eps =0.1 and metric="manhattan" (which takes absolute difference) . What am I missing here, should I use some other distance.Is my understanding of eps wrong.?What should I do inorder to get it clustered as I wish.
Upvotes: 1
Views: 1101
Reputation: 77454
DBSCAN epsilon is not a maximum cluster radius, but a step size. Clusters are built with many such steps, hence distances can be larger.
What you are looking for is probably Leader clustering. An older and simpler algorithm, but not particularly widely used - the purpose of clustering is to learn about the structure of your data, not to impose a predefined structure.
Since your data is one dimensional, why don't you just sort the data and then identify the threshold values you like. Or just cut at whatever thresholds you want, for example at 0, 0.1, 0.2, 0.3 with simple x < 0.1
masks, a numpy built-in functionality.
Upvotes: 0
Reputation: 3745
This is exactly how DBSCAN should work.
DBSCAN is a density based clustering algorithm. Put simply, it starts with a random point p
, if there are min_points
points in range epsilon
around p
then it becomes a core point. If two core points are within range epsilon
they are put in the same cluster.
This means: Two points far (e.g., greater than epsilon) away from each other can be connected by other core points in between and thus belong to the same cluster
The epsilon and min_points parameter you chose seem to result in one big cluster (with the exception of point 0)
Upvotes: 1