Reputation: 897
This is a follow-up question on my other posts.
Algorithm for clustering with size constraints
I'm working on a clustering algorithm, After some reclustering, now I have this set of points that none of them are in their optimal cluster but could not be reassigned individually, since it'll violate the constraint.
I'm trying to use a graph structure to solve the problem but came across a few issues in implementing.
I'm a beginner, please let me know if I'm wrong.
Per @Kittsil's answer
build a directed graph with clusters as nodes, such that an edge (A,B) exists if the global solution would be minimized by some point in A moving to B. Looking for cycles in this graph will allow you to find potential moves (where a move consists of moving every vertex in the cycle).
I revised the graph adding weight as the sum of number of points moving from A to B.
Here are some scenarios where I'm not sure how to decide on which point to reassign.
Scenario 1. For a cycle as below. There are two points can be moved from A to B, and three from B to C. Under this situation, which point should I select to reassign?
Scenario 2. For a cycle as below. Let all edge weights be 1. For the point in cluster A, it could be reassign to either cluster B or D. In this case. How should I do the reassignment?
Scenario 3 Similar to Scenario 2. Let all edge weights be 1. There are two small cycles in the bigger cycle. The point in cluster A can be reassigned to both B and E, how do I decide on the reassignment in this case?
Ideas about either scenario is welcomed!
Also any suggestions on implementing the algorithm considering the cases above? Even better with pseudocode. Thanks!
Upvotes: 3
Views: 1021
Reputation: 2487
If the edge weights are proportional to the benefit gained by reclustering the points, then a decent heuristic is to pick the cycle with the highest total weight. I think this addresses all three of your cases: whenever you have a choice, pick the one that will do the most good.
Discussion:
The algorithm described in Algorithm for clustering with size constraints is a greedy algorithm to find a local minimum. Therefore, there is no guarantee that you will find the best solution no matter how you choose in these scenarios, and any choice will drive you closer to the local minimum.
Due to the relation to similar problems of sorting with constraints, it is my intuition that the minimum problem is going to be NP-hard. If that is not the case, then there exists a much better algorithm than the one I previously described. However, this greedy algorithm seems like a fast solution for the minimal problem.
Upvotes: 1