Algorithm for partitioning set of sets into approximately equal-sized chunks?

Question

Consider a set A of n finite sets, whose members are not necessarily disjoint. Let P={P[1], P[2], ..., P[m]} be a partition of A, and for each i in 1..m, let U[i] be the union of all of the elements of P[i]. So U={U[1], U[2], ..., U[m]}. I would like an algorithm to a find a P such that the corresponding U is a partition, and such that the difference in cardinality (i.e. size) between the smallest and largest elements of U is minimised.

Characteristics of the data:

m is small (2 to 5) and n<10000
Typically, there is a large proportion of 1-element sets in A
Intersections between pairs of sets in A are typically small or empty

Robin Green · Accepted Answer

My necklace analogy in the question comments suggests this solution:

Build an undirected graph G whose vertices are the elements of A, and where there is an edge from A[i] to A[j] iff A[i] intersects A[j].
Find the connected components C of G. This can be done with a simple breadth-first or depth-first algorithm.
For each C[i], take the vertices of C[i] and union them together, yielding D[i]. You now have reduced the problem to a special case, because the set D is a partition of the union of the elements of A.
Use a bin-packing algorithm to try and fit the elements of D into precisely m bins, each of size ceil(t/m) where t is the size of the union of all the elements of D. If that fails, increase the sizes of the bins repeatedly until either it succeeds or it's clear that it's never going to succeed. Bin-packing algorithms are typically heuristic, so a perfect solution might not be found. Also, this is more than a simple bin-packing problem, so even a perfect bin-packing algorithm might not find the optimal solution.

I'd be interested to know if there is a more efficient solution. In particular, I have a hunch that the repeated use of the bin-packing algorithm in step 4 is not sensible.

Algorithm for partitioning set of sets into approximately equal-sized chunks?

Answers (2)

Related Questions