Reputation: 85
I have a list of points (x,y, pointNo) eg.:
[(344, 279, 0), (344, 276, 1), (342, 267, 2), (349, 259, 3), (348, 279, 4), (339, 268, 5), (343, 277, 6), (336, 275, 7), (344, 262, 8), (346, 269, 9), (279, 292, 10), (287, 287, 11), (278, 294, 12), (273, 294, 13), (280, 296, 14), (273, 291, 15), (287, 284, 16), (273, 292, 17), (273, 282, 18), (279, 296, 19), (210, 221, 20), (196, 230, 21), (191, 216, 22), (211, 221, 23), (192, 217, 24), (195, 230, 25), (192, 214, 26), (208, 225, 27), (206, 217, 28), (206, 224, 29), (176, 104, 30), (174, 114, 31), (180, 96, 32), (174, 103, 33), (171, 110, 34), (185, 114, 35), (179, 114, 36), (188, 100, 37), (183, 112, 38), (190, 115, 39), (274, 67, 40), (260, 62, 41), (264, 65, 42), (277, 78, 43), (274, 65, 44), (272, 75, 45), (260, 64, 46), (263, 68, 47), (259, 79, 48), (270, 64, 49), (344, 136, 50), (355, 129, 51), (344, 132, 52), (340, 122, 53), (348, 125, 54), (341, 136, 55), (343, 119, 56), (350, 136, 57), (348, 116, 58), (339, 135, 59), (213, 281, 60), (143, 211, 61), (125, 130, 62), (138, 241, 63), (350, 195, 64), (374, 189, 65), (362, 180, 66), (364, 187, 67), (375, 177, 68), (362, 187, 69), (364, 171, 70), (366, 180, 71), (366, 176, 72), (372, 178, 73), (366, 188, 74), (125, 132, 75), (125, 127, 76), (136, 140, 77), (120, 122, 78), (129, 134, 79), (124, 131, 80), (125, 138, 81), (128, 139, 82), (134, 124, 83), (123, 138, 84)]
How can I search and divide it into clusters, without giving the number of clusters. My second problem is how to get the center of every cluster.
I've found that kmeans is a nice tool, but it requires giving the number of clusters value.
Upvotes: 1
Views: 6817
Reputation: 17629
Maybe a little late to the party, but there is a nice comparison of clustering algorithms in the sklearn documention. Maybe there is one which fits your needs.
Upvotes: 3
Reputation: 6855
1) To know the number of clusters you have to define a threshold that will tell the algorithm that how much two tuples must differ before they can be considered as belonging to two different groups. For example, consider these two groups of coins: 5 cents, and 2 cents, such that each of them has a different weight. Let's say your algorithm clusters these coins based on their weights. It can happen that 5 cent coins do not always have the same weight (they might differ by 10 milligrams or so) so your threshold should be flexible in that regard. But if two coins differ by 1 gram let's say, then definitely they belong to two different groups. However, in your case you could use the Euclidean distance to find the differences between the tuples. But coming up with the threshold is a very challenging task and might require a lot of knowledge about the problem domain. Therefore, you could intuitively try different thresholds till you see satisfying results.
2) Once you have set the number of clusters, you can find their centers. The center of a cluster is basically the average 'x', and 'y' values of all the elements belonging to that cluster. If you don't know which elements belong to which cluster, you can randomize the cluster centers in the 2D space first, then take the average (x,y) of the elements closest to that center, in order to get the new center. This is what k-means does.
Hope this helps!!
Upvotes: 1