Split array into evenly-distributed chunks

Question

I'm looking for an efficient way to split an array into chunks that have a similar histogram.

For example, given the array:

l = np.array([1, 2, 3, 4, 1, 1, 1, 2, 3, 4, 55, 5, 5, 5, 5, 5, 3, 2, 2, 21, 2])

I would like to obtain:

c1 = np.array([1, 4, 1, 5, 5, 3, 2])
c2 = np.array([2, 1, 3, 4, 5, 5, 2])
c3 = np.array([3, 1, 2, 55, 5, 2, 21])

Not only this but each chunk should have a similar size and a similar sum on a given function f:

1. |sum(ci, f) - sum(cj, f)| < e, for i != j
2. |len(ci) - len(cj)| < e, for i != j

where

sum(c, f) = f(c[0]) + ... + f(c[len(c)])

Edit: To clarify the intention of this. I want to parallelize a process over a list into n sub-processes but the cost has to be distributed among each sub-process evenly. The cost of processing an element in this list is a function f of the integer at the same position in array l, where f is the computational complexity of the process. For example, f(i)=i^2. So, I want all processes to have the same computational cost and not to have processes that finish too early while others last forever.

Split array into evenly-distributed chunks

Answers (1)

Related Questions