Reputation: 5278
I'm using clustering algorithms like DBSCAN
.
It returns a 'cluster' called -1
which are points that are not part of any cluster. For these points I want to determine the distance from it to the nearest cluster to get something like a metric for how abnormal this point is. Is this possible? Or are there any alternatives for this kind of metric?
Upvotes: 4
Views: 3620
Reputation: 8829
The answer will depend on the linkage strategy you choose. I'll give the example of single linkage.
First, you can construct the distance matrix of your data.
from sklearn.metrics.pairwise import pairwise_distances
dist_matrix = pairwise_distances(X)
Then, you'll extract the nearest cluster:
for point in unclustered_points:
distances = []
for cluster in clusters:
distance = dist_matrix[point, cluster].min() # Single linkage
distances.append(distance)
print("The cluster for {} is {}".format(point, cluster)
EDIT: This works, but it's O(n^2) as noted by Anony-Mousse. Considering core points is a better idea because it cuts down on your work. In addition, it is somewhat similar to centroid linkage.
Upvotes: 4
Reputation: 77454
To be closer to the intuition of DBSCAN you probably should only consider core points.
Put the core points into a nearest neighbor searcher. Then search for all noise points, use the cluster label of the nearest point.
Upvotes: 1