Reputation: 71
I was trying to split a binary-tree into k similar-sized parts (by removing k-1 edges). Is there any efficient algorithm for this problem? Or is it NP-hard? Any pointers to papers, problem definitions, etc?
-- One reasonable metric for evaluating the quality of partitioning could be the size gap between the largest and smallest partition; another metric could be making the smallest partition having as many vertices as possible.
Upvotes: 3
Views: 2989
Reputation: 18566
Here is a polynomial deterministic solution:
Let's assume that the tree is rooted and there are two fixed values: MIN
and MAX
- minimum and maximum allowed size of one component.
Then one can use dynamic programming to check if there is a partition such that each component size is between MIN
and MAX
:
Let's assume f(node, cuts_count, current_count)
is true
if and only if there is a way to make exactly cuts_count
cuts in node
's subtree so that current_count
vertices are connected to the node
so that condition 2) holds true.
The base case for the leaves is: f(leaf, 1, 0)
(cut the edge from the parent to the leaf) is true
if and only if MIN <= 1
and MAX >= 1
f(leaf, 0, 1)
(do not cut it) is always true
(it is false
for all other values of cuts_count
and current_count
).
To compute f
for a node
(not a leaf), one can use the following algorithm:
//Combine all possible children states.
for cuts_left in 0..k
for cuts_right in 0..k
for cnt_left in 0..left_subtree_size
for cnt_right in 0..right_subtree_size
if f(left_child, cuts_left, cnt_left) is true and
f(right_child, cuts_right, cnt_right) is true and then
f(node, cuts_left + cuts_right, cnt_left + cnt_right + 1) = true
//Cut an edge from this node to its parent.
for cuts in 0..k-1
for cnt in 0..node's_subtree_size
if f(node, cuts, node's_subtree_size) is true and MIN <= cnt <= MAX:
f(node, cuts + 1, 0) = true
What this pseudo code does is combining all possible states of node's children to compute all reachable states for this node(the first bunch of for loops) and then produces the rest of reachable states by cutting the edge between this node and its parent(the second bunch of for loops)(the state means (node, cuts_count, current_count)
tuple. I call it reachable if f(state)
is true
).
That is the case for a node with two children, the case with one child can be processes in similar manner.
Finally, if f(root, k, 0)
is true
then it is possible to find the partition which stratifies the condition 2) and it is not possible otherwise. We need to "pretend" that we did k
cuts here because we also cut an imaginary edge from root to its parent(this edge and this parent doesn't exist actually) when we computed f
for the root(to avoid corner case).
The space complexity of this algorithm(for fixed MIN
and MAX
) is O(n^2 * k)
(n
is the number of nodes), time complexity is O(k^2 * n^2)
. It might seem that the complexity is actually O(k^2 * n^3)
, but is not so because the product of number of vertices in left and right subtree of a node is exactly the number of pairs of node's such that their least common ancestor is this node. But the total number of pairs of nodes is O(n^2)
(and each pair has only one least common ancestor). Thus, the sum of products of left and right subtree sizes over all nodes is O(n^2)
.
One can simply try all possible MIN
and MAX
values and choose the best, but it can be done faster. The key observation is that if there is a solution for MIN
and MAX
, there is always a solution for MIN
and MAX + 1
. Thus, one can iterate over all possible values of MIN
(n / k
different values) and apply binary search to find the smallest MAX
which gives a valid solution(log n
iterations). So the overall time complexity is O(n^2 * k^2 * n / k * log n) = O(n^3 * k * log n)
. However, if you want to maximize MIN
(not to minimize the difference between MAX
and MIN
), you can simply use this algorithm and ignore MAX
value everywhere(by setting its value to n
). Then no binary search over MAX
would be required, but one would be able to binary search over MIN
instead and obtain an O(n^2 * k^2 * log n)
solution.
To reconstruct the partition itself, one can start from f(root, k, 0)
and apply the steps we used to compute f
, but this time in opposite direction(from root to leaves). It is also possible to save the information about how to get the value of each state(what children's states were combined or what was the state before the edge was cut)(and update it appropriately during the initial computation of f
) and then reconstruct the partition using this data(if my explanation of this step seems not very clear, reading an article on dynamic programming and reconstructing the answer might help).
So, there is a polynomial solution for this problem on a binary tree(even though it is NP-hard for an arbitrary graph).
Upvotes: 3
Reputation: 861
I can suggest pretty fast solution for making the smallest part having as many vertices as possible metric.
Let suppose we guess the size S of smallest partit and want check if it's correct. First I want to make a few statements:
If total size of tree bigger than S there is at least one subtree which is bigger than S and all subtrees of that subtree are smaller. (It's enough to check both biggest.)
If there is some way to split tree where size of smallest part >= S and we have subtree T all subtrees of which are smaller than S than we can grant that no edges inside T are deleted. (Cause any such deletion will create a partition which will be smaller than S)
If there is some way to split tree where size of smallest part >= S, and we have some subtree T which size >= S, has no deleted edges inside but is not one of parts, we can split the tree in other way where subtree T will be one of parts itself and all parts will be no smaller than S. (Just move some extra vertices from original part to any other part, this other part will not become smaller.)
So here is an algorithm to check if we can split the tree in k parts no smaller than S.
find all suitable vertices (roots of subtrees of size >= S and size for both childs < S) and add them in list. You can start from the root and move through all vertices while subtrees are bigger than S.
While list not empty and number of parts lesser then K take a vertice from the list and cut it off the tree. Than update size of subtrees for parent vertices and add to the list if one of them become suitable. You even have no need to update all the parent vertices, only until you will find first which's new subtree size is bigger than S, parent vertices cant't be suitable for adding in list yet and can be updated later.
You may need to construct tree back to restore original subtree sizes assigned to the vertices.
Now we can use bisection method. We can determine upper bound as Smax = n/k and lower bound can be retrieved from equation (2*Smin- 1)*(K - 1) + Smin = N it will grants that if we will cut off k-1 subtrees with two child subtrees of size Smin - 1 each, we will have part of size Smin left. Smin = (n + k -1)/(2*k - 1) And now we can check S = (Smax + Smin)/2 If we manage to construct partition using the method above than S is smaller or equal to it's largest possible value, also smallest part in constructed partition may be bigger than S and we can set new lower bound to it instead of S, if we fail S is bigger than possible.
Time complexity of one check is k multiplied by number of parent nodes updated each time, for well balanced tree number of updated nodes is constant (we will use trick explaned earlier and will not update all parent nodes), still it's not bigger than (n/k) in worst case for ultimately unbalanced tree. Searching for suitable vertices has very similar behavior (all vertices passed while searching will be updated later.).
Difference between n/k and (n + k -1)/(2*k - 1) is proportional to n/k.
So we have time complexity O(k * log(n/k)) in best case if we have precalculated subtree sizes, O(n) if subtree sizes are not precalculated and O(n * log(n/k)) in worst case.
This method may lead to situation when last of parts will be comparably big but I suppose once you've got suggested method you can figure out some improvements to minimize it.
Upvotes: 3