synack
synack

Reputation: 1749

Split array into evenly-distributed chunks

I'm looking for an efficient way to split an array into chunks that have a similar histogram.

For example, given the array:

l = np.array([1, 2, 3, 4, 1, 1, 1, 2, 3, 4, 55, 5, 5, 5, 5, 5, 3, 2, 2, 21, 2])

I would like to obtain:

c1 = np.array([1, 4, 1, 5, 5, 3, 2])
c2 = np.array([2, 1, 3, 4, 5, 5, 2])
c3 = np.array([3, 1, 2, 55, 5, 2, 21])

Not only this but each chunk should have a similar size and a similar sum on a given function f:

1. |sum(ci, f) - sum(cj, f)| < e, for i != j
2. |len(ci) - len(cj)| < e, for i != j

where

sum(c, f) = f(c[0]) + ... + f(c[len(c)])

Edit: To clarify the intention of this. I want to parallelize a process over a list into n sub-processes but the cost has to be distributed among each sub-process evenly. The cost of processing an element in this list is a function f of the integer at the same position in array l, where f is the computational complexity of the process. For example, f(i)=i^2. So, I want all processes to have the same computational cost and not to have processes that finish too early while others last forever.

Upvotes: 6

Views: 1284

Answers (1)

mockinterface
mockinterface

Reputation: 14860

Let's start with a very weak assumption that similar histograms are defined in a following rudimentary way: for a set of integers S1, the histogram H(S1) is similar to an histogram H(S2) of set S2 if sum(S1) = sum(S2).

Generally you are after finding subsets S1, S2, ..., SN of an array A such that f(S1) = f(S2) = ... = f(SN), and where under our assumptions f=sum. Unfortunately you've got yourself a k-Partition problem, which is NP-hard, and if you'll get someone to find an efficient way (ie poly-time) to do that, as you requested, the consequence would be that P=NP was proven to be true first on stackoverflow!

Upvotes: 2

Related Questions