Fast algorithm for distributing a value over a histogram?

Question

I am looking for a fast (both in terms of complexity (the size of the problem may get close to 2^32) and in terms of the constant) algorithm, that doesn't necessarily have to compute the optimal solution (so a heuristic is acceptable if it produces results "close" to the optimal and has a "considerable" advantage in terms of computation time compared to computing the optimal solution) for a specific problem.

I have an integer histogram A: |A| = n, A[i]>0; and a value R: 0. I must distribute -R over the histogram as evenly as possible. Formally this means something like this (there is some additional information in the formal notation too): I need to find B, such that |B| = |A| && B[i] = A[i] - C[i], where 0<=C[i]<=A[i] && C[0]+...+C[n-1] = R and C must minimize the expressions: L_2 = C[0]^2 + ... + C[n-1]^2 and L_infinity = max(C[0], ..., C[n-1]). Just from the formulation one can see that the problem doesn't necessarily have a unique solution (consider A[0] = 1, A[1] = 1 and R = 1, then both B[0]=0, B[1]=1 and B'[0]=1, B'[1]=0 are optimal solutions), an additional constraint may be added such as if A[i] but it is not as important in my case. Naively one can iterate over all possibilities for C[i] (R-combination with repetitions) and find the optimal solutions, but obviously that is not very fast for larger n.



Another possible solution is finding q = R/n and r=R%n, then iterating over all elements and storing diff[i] = A[i]-q, if diff[i]<=0 then r-=diff[i] && B[i] = 0 && remove A[i], then continue with all non-removed A[i], by setting them to A[i] = diff[i], R = r, and n=n-removedElementsCount. If iterating this process, then at each step we would remove at least one element, until we reach the point where q == 0 or we have only 1 element, then we just need to only have A[i]-=1 for R such elements from A, since by then R in the q==0 case or just have A[i]-=R if we are in the case where we have only 1 element leftover (the case where we have 0 elements is trivial). Since we remove at least one element each step, and we need to iterate over (n - step) elements in the worst case, then we have a complexity of O((1+...+n)) = O(n^2).


I am hoping that somebody is already familiar with a better algorithm or if you have any ideas I'll be glad to hear them (I am aware that this can be regarded as an optimization problem also).

edit: made R positive so it would be easier to read.

Edit 2: I realized I messed up the optimization criteria.

lightxbulb · Accepted Answer

I came up with a very simple greedy algorithm in O(n*log(n)) time (if somebody manages to solve it in O(n) though I'll be glad to hear).

Algorithm:

Given: integer array: A[0],...,A[|A|-1]: A[i]>=0; integer: R0: 0<=R0<=A[0]+...+A[|A|-1].
Base:

Sort A in ascending order - takes O(n*log(n) time.

Set i = 0; R = R0; n = |A|; q = floor(R/n); r = R - q*n; d = q;.

if(i==|A| or R==0) goto 6.;
if(i>=|A|-r) d = q + 1;

4.

if(A[i]>=d) 
{ 
    R-=d; 
    A[i]-=d; 
}
else 
{ 
    R-=A[i]; 
    A[i] = 0; 
    n = |A|-(i+1); 
    q = floor(R/n);
    d = q;
    r = R - q*n; 
}

i=i+1; goto 2.;
if(R>0) A[|A|-1] -= R; return A;

Informal solution optimality proof:

Let n = |A|.

Case 0: n==1 -> C[0] = R

Case 1: n>1 && A[i]>=q && A[j]>=q+1 for j>=max(0,n-r)

The optimal solution is given by C[i] = q for i=n-r. Assume there is another optimal solution given by C'[i] = C[i] + E[i], where the constraints for E are: E[0]+...+E[m-1]=0 (otherwise C' would violate C'[0] + ... + C'[n-1] = R), C[i]>=-E[i] (otherwise C'[i] would violate the non-negativity constraint), E[i] <= A[i] - C[i] (from C'[i]<=A[i]), and E[i]<=E[j] for i<=j (from C[i]<=C[j] for A[i]<=A[j] && A[i]<=A[j] for i<=j), then: L_2' - L_2 = 2*q*(E[0]+...+E[n-r-1]) + 2*(q+1)*(E[n-r]+...+E[n-1]) + (E[0]^2 + ... + E[n-1]^2) = 2*q*0 + (E[0]^2 + ... + E[n-1]^2) + 2*(E[n-r] + ... + E[n-1]) >= 0 The last inequality is true since for every term 2*E[n-i], 1<=i<=r, there is a corresponding term E[n-i]^2, 1<=i<=r to cancel it out if it is negative at least for E[n-i]<-1. Let us analyze the case where 2*E[n-i] = -2, obviously E[n-i]^2 = 1 is not enough to cancel it out in this case. However, since all elements of E sum to 0, there exists j!=n-i: such that E[j] compensates for it, since we have the term E[j]^2. From the last inequality follows L_2<=L_2' for every possible solution C', this implies that C minimizes L_2. It is trivial to see that the L_inf minimization is also satisfied: L_inf = q + (r>0) <= L_inf' = max(q+E[0], ... , q+E[n-r-1], q+1+E[n-r], ... , q+1+E[n-1]), if we were to have an E[i]>1 for i0 for j>=n-r, we get a higher maximum, we can also never decrease the maximum, since E sums to 0.

Case 2: n>1 && there exists k: A[k]



In this case the optimal solution requires that C[k] = A[k] for all k: A[k]. Let us assume that there exists an optimal solution C' such that C'[k] C'[k]. There exists i>=k, such that C'[i]=q-1. Assume there is no such i, then C'[k] == C[n-1] < q-1, and C'[0]+...+C'[n-1], this is a contradiction, which implies that such an i actually does exist. There also exists a j>k such that C[j]>q && C[j-1] (if we assume this is untrue we once again get a contradiction with C summing to R). We needed these proofs in order to satisfy C[t]<=C[l] for t<=l.  Let us consider the modified solution C''[t] = C'[t] for t!=i,j; and C''[i] = C'[i]+1, and C''[j] = C'[j]-1. L_2' - L_2'' = C'[i]^2 - (C'[i]+1)^2 + C'[j]^2 - (C'[j]-1)^2 = -2*C'[i] + 2*C'[j] - 2 = 2*((C'[j]-C'[i])-1) > 2*(1-1) = 0. The last inequality follows from (C'[i]q) -> C'[j] - C'[i] > 1. We proved that L_2'>L_2'' if we increment C[i]: C[i]. By induction the optimal solution should have C[l]=A[l] for all l: A[l]. Once this is done one can inductively continue with the reduced problem n' = n-(i+1), R' = R - (C[0]+...+C[i]), q' = floor(R'/n'), r' = R' - q'*n', D[0] = A[i+1], ..., D[n'-1] = A[n-1].


Case 3: n>1 && A[i]>=q && A[j]


Since A[k]>=A[i] for k>=i, that implies that A[i]. But since we have also q<=A[i] this implies A[i]==q, so we cannot add any of the remainder in any C[i] : i<=j. The optimality of C[i]=A[i]=q for i follows from a proof done in case 1 (the proof there was more general with q+1 terms). Since the problem is optimal for 0<=i we can start solving a reduced problem: D[0] = A[j],...,D[n-j] = A[n-1].


Case 0, 1, 2, 3 are all the possible cases. Apart from case 0 and case 1 which give the solution explicitly, the solution in 2 and 3 reduces the problem to a smaller one which once again falls in one of the cases. Since the problem is reduced at every step, we get the final solution in a finite number of steps. We also never refer to an element more than once which implies O(n) time, but we need O(n*log(n)) for the sorting, so in the end we have O(n*log(n)) time complexity for the algorithm. I am unsure whether this problem can be solved in O(n) time, but I have the feeling that there is no way to get away without the sorting since case 2 and 3 rely on it heavily, so maybe O(n*log(n)) is the best possible complexity that can be achieved.

Fast algorithm for distributing a value over a histogram?

Answers (2)

Related Questions