Reputation: 970
I'm using DBSCAN method for clustering images, but it gives unexpected result. Let's assume I have 10 images.
Firstly, I read an images in a loop using cv2.imread
.
Then I compute structural similarity index between each images. After that, I have a matrix like this:
[
[ 1. -0.00893619 0. 0. 0. 0.50148778 0.47921832 0. 0. 0. ]
[-0.00893619 1. 0. 0. 0. 0.00996088 -0.01873205 0. 0. 0. ]
[ 0. 0. 1. 0.57884212 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0.57884212 1. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0.50148778 0.00996088 0. 0. 0. 1. 0.63224396 0. 0. 0. ]
[ 0.47921832 -0.01873205 0. 0. 0. 0.63224396 1. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0.77507487 0.69697053]
[ 0. 0. 0. 0. 0. 0. 0. 0.77507487 1. 0.74861881]
[ 0. 0. 0. 0. 0. 0. 0. 0.69697053 0.74861881 1. ]]
Looks good. Then I have simple invokation of DBSCAN:
db = DBSCAN(eps=0.4, min_samples=3, metric='precomputed').fit(distances)
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
And the result is
[0 0 0 0 0 0 0 0 0 0]
What do I do wrong? Why it puts all images into one cluster?
Upvotes: -1
Views: 568
Reputation: 77454
DBSCAN usually assumes a dissimilarity (distance) not a similarity. It can be implemented with a similarity threshold, too (see Generalized DBSCAN)
Upvotes: 1
Reputation: 970
The problem was that I've calculated distance matrix incorrectly - the entries on the main diagonal are all zero.
Upvotes: 0