Reputation: 1321
I am looking for a fast (both in terms of complexity (the size of the problem may get close to 2^32) and in terms of the constant) algorithm, that doesn't necessarily have to compute the optimal solution (so a heuristic is acceptable if it produces results "close" to the optimal and has a "considerable" advantage in terms of computation time compared to computing the optimal solution) for a specific problem.
I have an integer histogram A: |A| = n, A[i]>0
; and a value R: 0<R<=A[0]+...+A[n-1]
. I must distribute -R over the histogram as evenly as possible. Formally this means something like this (there is some additional information in the formal notation too): I need to find B, such that |B| = |A| && B[i] = A[i] - C[i]
, where 0<=C[i]<=A[i] && C[0]+...+C[n-1] = R
and C must minimize the expressions: L_2 = C[0]^2 + ... + C[n-1]^2
and L_infinity = max(C[0], ..., C[n-1])
. Just from the formulation one can see that the problem doesn't necessarily have a unique solution (consider A[0] = 1, A[1] = 1 and R = 1, then both B[0]=0, B[1]=1 and B'[0]=1, B'[1]=0 are optimal solutions), an additional constraint may be added such as if A[i]<A[j] then C[i]<C[j]
but it is not as important in my case. Naively one can iterate over all possibilities for C[i] (R-combination with repetitions) and find the optimal solutions, but obviously that is not very fast for larger n.
Another possible solution is finding q = R/n
and r=R%n
, then iterating over all elements and storing diff[i] = A[i]-q
, if diff[i]<=0 then r-=diff[i] && B[i] = 0 && remove A[i]
, then continue with all non-removed A[i], by setting them to A[i] = diff[i]
, R = r
, and n=n-removedElementsCount
. If iterating this process, then at each step we would remove at least one element, until we reach the point where q == 0
or we have only 1 element, then we just need to only have A[i]-=1
for R such elements from A, since by then R<n
in the q==0
case or just have A[i]-=R
if we are in the case where we have only 1 element leftover (the case where we have 0 elements is trivial). Since we remove at least one element each step, and we need to iterate over (n - step)
elements in the worst case, then we have a complexity of O((1+...+n)) = O(n^2).
I am hoping that somebody is already familiar with a better algorithm or if you have any ideas I'll be glad to hear them (I am aware that this can be regarded as an optimization problem also).
edit: made R positive so it would be easier to read.
Edit 2: I realized I messed up the optimization criteria.
Upvotes: 0
Views: 309
Reputation: 1321
I came up with a very simple greedy algorithm in O(n*log(n)) time (if somebody manages to solve it in O(n) though I'll be glad to hear).
Algorithm:
Given: integer array: A[0],...,A[|A|-1]: A[i]>=0
; integer: R0: 0<=R0<=A[0]+...+A[|A|-1]
.
Base:
Sort A in ascending order - takes O(n*log(n) time.
Set i = 0; R = R0; n = |A|; q = floor(R/n); r = R - q*n; d = q;
.
if(i==|A| or R==0) goto 6.;
if(i>=|A|-r) d = q + 1;
4.
if(A[i]>=d)
{
R-=d;
A[i]-=d;
}
else
{
R-=A[i];
A[i] = 0;
n = |A|-(i+1);
q = floor(R/n);
d = q;
r = R - q*n;
}
i=i+1; goto 2.;
if(R>0) A[|A|-1] -= R; return A;
Informal solution optimality proof:
Let n = |A|
.
Case 0: n==1 -> C[0] = R
Case 1: n>1 && A[i]>=q && A[j]>=q+1 for j>=max(0,n-r)
The optimal solution is given by C[i] = q for i<n-r && C[j] = q+1 for i>=n-r
.
Assume there is another optimal solution given by C'[i] = C[i] + E[i]
, where the constraints for E
are: E[0]+...+E[m-1]=0
(otherwise C'
would violate C'[0] + ... + C'[n-1] = R
), C[i]>=-E[i]
(otherwise C'[i]
would violate the non-negativity constraint), E[i] <= A[i] - C[i]
(from C'[i]<=A[i]
), and E[i]<=E[j] for i<=j
(from C[i]<=C[j] for A[i]<=A[j] && A[i]<=A[j] for i<=j
), then:
L_2' - L_2 = 2*q*(E[0]+...+E[n-r-1]) + 2*(q+1)*(E[n-r]+...+E[n-1]) + (E[0]^2 + ... + E[n-1]^2) = 2*q*0 + (E[0]^2 + ... + E[n-1]^2) + 2*(E[n-r] + ... + E[n-1]) >= 0
The last inequality is true since for every term 2*E[n-i], 1<=i<=r
, there is a corresponding term E[n-i]^2, 1<=i<=r
to cancel it out if it is negative at least for E[n-i]<-1
. Let us analyze the case where 2*E[n-i] = -2
, obviously E[n-i]^2 = 1
is not enough to cancel it out in this case. However, since all elements of E
sum to 0
, there exists j!=n-i
: such that E[j]
compensates for it, since we have the term E[j]^2
. From the last inequality follows L_2
<=L_2'
for every possible solution C'
, this implies that C
minimizes L_2
. It is trivial to see that the L_inf
minimization is also satisfied: L_inf = q + (r>0) <= L_inf' = max(q+E[0], ... , q+E[n-r-1], q+1+E[n-r], ... , q+1+E[n-1])
, if we were to have an E[i]>1 for i<n-r, or E[j]>0 for j>=n-r
, we get a higher maximum, we can also never decrease the maximum, since E
sums to 0
.
Case 2: n>1 && there exists k: A[k]<q
In this case the optimal solution requires that C[k] = A[k]
for all k: A[k]<q
. Let us assume that there exists an optimal solution C'
such that C'[k]<A[k]<q -> C'[k]<q-1
. There exists i>=k
, such that C'[i]<q-1 && C'[i+1]>=q-1
. Assume there is no such i
, then C'[k] == C[n-1] < q-1
, and C'[0]+...+C'[n-1]<n*q-n<R
, this is a contradiction, which implies that such an i
actually does exist. There also exists a j>k
such that C[j]>q && C[j-1]<C[j]
(if we assume this is untrue we once again get a contradiction with C
summing to R
). We needed these proofs in order to satisfy C[t]<=C[l] for t<=l
. Let us consider the modified solution C''[t] = C'[t] for t!=i,j; and C''[i] = C'[i]+1, and C''[j] = C'[j]-1
. L_2' - L_2'' = C'[i]^2 - (C'[i]+1)^2 + C'[j]^2 - (C'[j]-1)^2 = -2*C'[i] + 2*C'[j] - 2 = 2*((C'[j]-C'[i])-1) > 2*(1-1) = 0
. The last inequality follows from (C'[i]<q-1 && C'[j]>q) -> C'[j] - C'[i] > 1
. We proved that L_2'>L_2''
if we increment C[i]: C[i]<A[i]<q
. By induction the optimal solution should have C[l]=A[l] for all l: A[l]<q
. Once this is done one can inductively continue with the reduced problem n' = n-(i+1), R' = R - (C[0]+...+C[i]), q' = floor(R'/n'), r' = R' - q'*n', D[0] = A[i+1], ..., D[n'-1] = A[n-1]
.
Case 3: n>1 && A[i]>=q && A[j]<q+1 for j==max(0,n-r)
Since A[k]>=A[i] for k>=i
, that implies that A[i]<q+1 for i<=j
. But since we have also q<=A[i]
this implies A[i]==q
, so we cannot add any of the remainder in any C[i] : i<=j
. The optimality of C[i]=A[i]=q for i<j
follows from a proof done in case 1 (the proof there was more general with q+1
terms). Since the problem is optimal for 0<=i<j
we can start solving a reduced problem: D[0] = A[j],...,D[n-j] = A[n-1]
.
Case 0, 1, 2, 3 are all the possible cases. Apart from case 0 and case 1 which give the solution explicitly, the solution in 2 and 3 reduces the problem to a smaller one which once again falls in one of the cases. Since the problem is reduced at every step, we get the final solution in a finite number of steps. We also never refer to an element more than once which implies O(n)
time, but we need O(n*log(n))
for the sorting, so in the end we have O(n*log(n))
time complexity for the algorithm. I am unsure whether this problem can be solved in O(n)
time, but I have the feeling that there is no way to get away without the sorting since case 2 and 3 rely on it heavily, so maybe O(n*log(n))
is the best possible complexity that can be achieved.
Upvotes: 0
Reputation: 46455
Turn your histogram into an array of (value, index)
pairs, and then turn it into a min heap. This operation is O(n)
.
Now your C
is going to take some set of values to 0
, reduce some by the max amount, and the rest by 1 less than the max amount. The max amount that you'd like to reduce everything by is easy to calculate, it is R/n
rounded up.
Now go through the heap. As long as the value for the bottom of the heap is < ceil(R/size of heap)
, that value at that index will be set to zero, and remove that from the heap in time O(log(n))
. Once that loop finishes, you can assign the max value and 1 less than the max value randomly to the rest.
This will run in O(n log(n))
worst time. You will hit that worst case when O(n)
elements have to be zeroed out.
Upvotes: 1