Reputation: 1
I have a randomly generated dataset of synthetic clusters, with 25 files (each can contain up to 6 clusters with up to 15 points each) for each dimension from 2 to 100. My issue is that it seems that the adjusted mutual information scores and adjusted rand indices for BIRCH and Agglomerative Clustering (Scikit-Learn implementation) seem to always be equal from dimension 10 or so upwards. Below that dimension, there are some differences, so it's not that I am calling the same function accidentally. The clusters should also overlap each other fairly well at some times, but also be separated in other cases. Since this its the mean of 25 tests for each dimension that are equal to 8 decimal points, I am really confounded on how these values can be the same. This is creepy, spooky, etc., whatever you want to call it. Why might this be happening?
I looked through the raw data and the Rand and AMI scores indeed seemed to match in higher dimensions. I went through my code to see if there is a typo, but there's none.
Upvotes: 0
Views: 106