Reputation: 318
I use DBSCAN implementation from scikit-learn library and I got strange results. The number of estimated clusters increased with the increase of parameter MinPts (min_samples) and from my understanding of algorithm this should not happend.
Here are my results:
Estimated number of clusters:34 eps=0.9 min_samples=13.0
Estimated number of clusters:35 eps=0.9 min_samples=12.0
Estimated number of clusters:42 eps=0.9 min_samples=11.0 <- strange result here
Estimated number of clusters:37 eps=0.9 min_samples=10.0
Estimated number of clusters:53 eps=0.9 min_samples=9.0
Estimated number of clusters:63 eps=0.9 min_samples=8.0
I use scikit-learn like this:
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=eps, min_samples=min_samples, algorithm='kd_tree').fit(X)
and X is an array that contains ~200k 12-dimensional points.
What can be the problem here?
Upvotes: 3
Views: 3129
Reputation: 363807
DBSCAN divides points/samples into three categories:
min_samples
in scikit-learn's implementation is the neighborhood density parameter.Now, as you require a denser neighborhood for core points, you get fewer core points, but a core point x losing its status can have three effects depending on the density just outside its neighborhood:
Upvotes: 9