Reputation: 10998
I have a need to write code that will prorate a value across a list, based on the relative weights of "basis" values in the list. Simply dividing the "basis" values by the sum of the "basis" values and then multiplying the factor by the original value to prorate works to a certain degree:
proratedValue = (basis / basisTotal) * prorationAmount;
However, the result of this calculation must then be rounded to integer values. The effect of the rounding means that the the sum of proratedValue for all items in the list may differ from the original prorationAmount.
Can anyone explain how to apply a "lossless" proration algorithm that proportionately distributes a value across a list as accurately as possible, without suffering from rounding errors?
Upvotes: 25
Views: 21775
Reputation: 527328
Simple algorithm sketch here...
This is guaranteed to have the total amount prorated equal to the input prorate amount, because you never actually modify the running total itself (you only take rounded values of it for other calculations, you don't write them back). What would have been an issue with integer rounding before is now dealt with, since the rounding error will add up over time in the running total and eventually push a value across the rounding threshold in the other direction.
Basic example:
Input basis: [0.2, 0.3, 0.3, 0.2]
Total prorate: 47
----
R used to indicate running total here:
R = 0
First basis:
oldR = R [0]
R += (0.2 / 1.0 * 47) [= 9.4]
results[0] = int(R) - int(oldR) [= 9]
Second basis:
oldR = R [9.4]
R += (0.3 / 1.0 * 47) [+ 14.1, = 23.5 total]
results[1] = int(R) - int(oldR) [23-9, = 14]
Third basis:
oldR = R [23.5]
R += (0.3 / 1.0 * 47) [+ 14.1, = 37.6 total]
results[1] = int(R) - int(oldR) [38-23, = 15]
Fourth basis:
oldR = R [37.6]
R += (0.2 / 1.0 * 47) [+ 9.4, = 47 total]
results[1] = int(R) - int(oldR) [47-38, = 9]
9+14+15+9 = 47
Upvotes: 19
Reputation: 1020
I think proportional distributions is the answer: http://www.sangakoo.com/en/unit/proportional-distributions-direct-and-inverse
Upvotes: 0
Reputation: 8228
TL;DR algorithm with best (+20%) possible accuracy, 70% slower.
Evaulated algorithms presented in accepted answer here as well as answer to python question of similar nature.
Testing results (10,000 iterations)
Algorithm | Avg Abs Diff (x lowest) | Time (x lowest)
------------------------------------------------------------------
Distribute 1 | 0.5282 (1.1992) | 00:00:00.0906921 (1.0000)
Distribute 2 | 0.4526 (1.0275) | 00:00:00.0963136 (1.0620)
Distribute 3 | 0.4405 (1.0000) | 00:00:01.1689239 (12.8889)
Distribute 4 | 0.4405 (1.0000) | 00:00:00.1548484 (1.7074)
Method 3 present has 19.9% better accuracy, for 70.7% slower execution time as expected.
Makes best effort to be as accurate as possible in distributing amount.
Sacrifices speed for accuracy by making more then one pass through the loop.
public static IEnumerable<int> Distribute3(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var query = from w in weights
let fraction = amount * (w / totalWeight)
let integral = (int)Math.Floor(fraction)
select Tuple.Create(integral, fraction);
var result = query.ToList();
var added = result.Sum(x => x.Item1);
while (added < amount)
{
var maxError = result.Max(x => x.Item2 - x.Item1);
var index = result.FindIndex(x => (x.Item2 - x.Item1) == maxError);
result[index] = Tuple.Create(result[index].Item1 + 1, result[index].Item2);
added += 1;
}
return result.Select(x => x.Item1);
}
public static IEnumerable<int> Distribute4(IEnumerable<double> weights, int amount)
{
var totalWeight = weights.Sum();
var length = weights.Count();
var actual = new double[length];
var error = new double[length];
var rounded = new int[length];
var added = 0;
var i = 0;
foreach (var w in weights)
{
actual[i] = amount * (w / totalWeight);
rounded[i] = (int)Math.Floor(actual[i]);
error[i] = actual[i] - rounded[i];
added += rounded[i];
i += 1;
}
while (added < amount)
{
var maxError = 0.0;
var maxErrorIndex = -1;
for(var e = 0; e < length; ++e)
{
if (error[e] > maxError)
{
maxError = error[e];
maxErrorIndex = e;
}
}
rounded[maxErrorIndex] += 1;
error[maxErrorIndex] -= 1;
added += 1;
}
return rounded;
}
static void Main(string[] args)
{
Random r = new Random();
Stopwatch[] time = new[] { new Stopwatch(), new Stopwatch(), new Stopwatch(), new Stopwatch() };
double[][] results = new[] { new double[Iterations], new double[Iterations], new double[Iterations], new double[Iterations] };
for (var i = 0; i < Iterations; ++i)
{
double[] weights = new double[r.Next(MinimumWeights, MaximumWeights)];
for (var w = 0; w < weights.Length; ++w)
{
weights[w] = (r.NextDouble() * (MaximumWeight - MinimumWeight)) + MinimumWeight;
}
var amount = r.Next(MinimumAmount, MaximumAmount);
var totalWeight = weights.Sum();
var expected = weights.Select(w => (w / totalWeight) * amount).ToArray();
Action<int, DistributeDelgate> runTest = (resultIndex, func) =>
{
time[resultIndex].Start();
var result = func(weights, amount).ToArray();
time[resultIndex].Stop();
var total = result.Sum();
if (total != amount)
throw new Exception("Invalid total");
var diff = expected.Zip(result, (e, a) => Math.Abs(e - a)).Sum() / amount;
results[resultIndex][i] = diff;
};
runTest(0, Distribute1);
runTest(1, Distribute2);
runTest(2, Distribute3);
runTest(3, Distribute4);
}
}
Upvotes: 13
Reputation: 11499
This is an apportionment problem, for which there are many known methods. All have certain pathologies: the Alabama paradox, the population paradox, or a failure of the quota rule. (Balinski and Young proved that no method can avoid all three.) You'll probably want one that follows the quote rule and avoids the Alabama paradox; the population paradox isn't as much of a concern since there's no much difference in the number of days per month between different years.
Upvotes: 1
Reputation: 5913
Ok. I'm pretty certain that the original algorithm (as written) and the code posted (as written) doesn't quite answer the mail for the test case outlined by @Mathias.
My intended use of this algorithm is a slightly more specific application. Rather than calculating the % using (@amt / @SumAmt)
as shown in the original question. I have a fixed $ amount that needs to be split or spread across multiple items based on a % split defined for each of those items. The split % sums to 100%, however, straight multiplication often results in decimals that (when forced to round to whole $) don't add up to the total amount that I'm splitting apart. This is the core of the problem.
I'm fairly certain that the original answer from @Dav doesn't work in cases where (as @Mathias described) the rounded values are equal across multiple slices. This problem with the original algorithm and code can be summed up with one test case:
Take $100 and split it 3 ways using 33.333333% as your percentage.
Using the code posted by @jtw (assuming this is an accurate implementation of the original algorithm), yields you the incorrect answer of allocating $33 to each item (resulting in an overall sum of $99), so it fails the test.
I think a more accurate algorithm might be:
( [Amount to be Split] * [% to Split] )
[Remainder] + ( [UnRounded Amount] - [Rounded Amount] )
Round( [Remainder], 0 ) > 1
OR the current item is the LAST ITEM in the list, then set the item's allocation = [Rounded Amount] + Round( [Remainder], 0 )
[Rounded Amount]
Implemented in T-SQL, it looks like this:
-- Start of Code --
Drop Table #SplitList
Create Table #SplitList ( idno int , pctsplit decimal(5, 4), amt int , roundedAmt int )
-- Test Case #1
--Insert Into #SplitList Values (1, 0.3333, 100, 0)
--Insert Into #SplitList Values (2, 0.3333, 100, 0)
--Insert Into #SplitList Values (3, 0.3333, 100, 0)
-- Test Case #2
--Insert Into #SplitList Values (1, 0.20, 57, 0)
--Insert Into #SplitList Values (2, 0.20, 57, 0)
--Insert Into #SplitList Values (3, 0.20, 57, 0)
--Insert Into #SplitList Values (4, 0.20, 57, 0)
--Insert Into #SplitList Values (5, 0.20, 57, 0)
-- Test Case #3
--Insert Into #SplitList Values (1, 0.43, 10, 0)
--Insert Into #SplitList Values (2, 0.22, 10, 0)
--Insert Into #SplitList Values (3, 0.11, 10, 0)
--Insert Into #SplitList Values (4, 0.24, 10, 0)
-- Test Case #4
Insert Into #SplitList Values (1, 0.50, 75, 0)
Insert Into #SplitList Values (2, 0.50, 75, 0)
Declare @R Float
Declare @Results Float
Declare @unroundedAmt Float
Declare @idno Int
Declare @roundedAmt Int
Declare @amt Float
Declare @pctsplit Float
declare @rowCnt int
Select @R = 0
select @rowCnt = 0
-- Define the cursor
Declare SplitList Cursor For
Select idno, pctsplit, amt, roundedAmt From #SplitList Order By amt Desc
-- Open the cursor
Open SplitList
-- Assign the values of the first record
Fetch Next From SplitList Into @idno, @pctsplit, @amt, @roundedAmt
-- Loop through the records
While @@FETCH_STATUS = 0
Begin
-- Get derived Amounts from cursor
select @unroundedAmt = ( @amt * @pctsplit )
select @roundedAmt = Round( @unroundedAmt, 0 )
-- Remainder
Select @R = @R + @unroundedAmt - @roundedAmt
select @rowCnt = @rowCnt + 1
-- Magic Happens! (aka Secret Sauce)
if ( round(@R, 0 ) >= 1 ) or ( @@CURSOR_ROWS = @rowCnt ) Begin
select @Results = @roundedAmt + round( @R, 0 )
select @R = @R - round( @R, 0 )
End
else Begin
Select @Results = @roundedAmt
End
If Round(@Results, 0) <> 0
Begin
Update #SplitList Set roundedAmt = @Results Where idno = @idno
End
-- Assign the values of the next record
Fetch Next From SplitList Into @idno, @pctsplit, @amt, @roundedAmt
End
-- Close the cursor
Close SplitList
Deallocate SplitList
-- Now do the check
Select * From #SplitList
Select Sum(roundedAmt), max( amt ),
case when max(amt) <> sum(roundedamt) then 'ERROR' else 'OK' end as Test
From #SplitList
-- End of Code --
Which yields a final result set for the test case of:
idno pctsplit amt roundedAmt
1 0.3333 100 33
2 0.3333 100 34
3 0.3333 100 33
As near as I can tell (and I've got several test cases in the code), this handles all of these situations pretty gracefully.
Upvotes: 2
Reputation: 15391
The problem you have is to define what an "acceptable" rounding policy is, or in other words, what it is you are trying to minimize. Consider first this situation: you have only 2 identical items in your list, and are trying to allocate 3 units. Ideally, you would want to allocate the same amount to each item (1.5), but that is clearly not going to happen. The "best" you could do is likely to allocate 1 and 2, or 2 and 1. So
Then, I chose 1 and 2 over 0 and 3 because I assume that what you want is to minimize the difference between the perfect allocation, and the integer allocation. This might not be what you consider "a good allocation", and this is a question you need to think about: what would make an allocation better than another one?
One possible value function could be to minimize the "total error", i.e. the sum of the absolute values of the differences between your allocation and the "perfect", unconstrained allocation.
It sounds to me that something inspired by Branch and Bound could work, but it's non trivial.
Assuming that Dav solution always produces an allocation that satisfies the constraint (which I'll trust is the case), I assume that it is not guaranteed to give you the "best" solution, "best" defined by whatever distance/fit metric you end up adopting. My reason for this is that this is a greedy algorithm, which in integer programming problems can lead you to solutions which are really off the optimal solution. But if you can live with a "somewhat correct" allocation, then I say, go for it! Doing it "optimally" doesn't sound trivial.
Best of luck!
Upvotes: 2