Reputation: 10998

Proportionately distribute (prorate) a value across a set of values

I have a need to write code that will prorate a value across a list, based on the relative weights of "basis" values in the list. Simply dividing the "basis" values by the sum of the "basis" values and then multiplying the factor by the original value to prorate works to a certain degree:

proratedValue = (basis / basisTotal) * prorationAmount;

However, the result of this calculation must then be rounded to integer values. The effect of the rounding means that the the sum of proratedValue for all items in the list may differ from the original prorationAmount.

Can anyone explain how to apply a "lossless" proration algorithm that proportionately distributes a value across a list as accurately as possible, without suffering from rounding errors?

Upvotes: 25

Answers (6)

Amber

Reputation: 527328

Simple algorithm sketch here...

Have a running total which starts at zero.
Do your standard "divide basis by total basis, then multiply by proportion amount" for the first item.
Store the original value of the running total elsewhere, then add the amount you just calculated in #2.
Round both the old value and the new value of the running total to integers (don't modify the existing values, round them into separate variables), and take the difference.
The number calculated in step 4 is the value assigned to the current basis.
Repeat steps #2-5 for each basis.

This is guaranteed to have the total amount prorated equal to the input prorate amount, because you never actually modify the running total itself (you only take rounded values of it for other calculations, you don't write them back). What would have been an issue with integer rounding before is now dealt with, since the rounding error will add up over time in the running total and eventually push a value across the rounding threshold in the other direction.

Basic example:

Input basis: [0.2, 0.3, 0.3, 0.2]
Total prorate: 47

----

R used to indicate running total here:

R = 0

First basis:
  oldR = R [0]
  R += (0.2 / 1.0 * 47) [= 9.4]
  results[0] = int(R) - int(oldR) [= 9]

Second basis:
  oldR = R [9.4]
  R += (0.3 / 1.0 * 47) [+ 14.1, = 23.5 total]
  results[1] = int(R) - int(oldR) [23-9, = 14]

Third basis:
  oldR = R [23.5]
  R += (0.3 / 1.0 * 47) [+ 14.1, = 37.6 total]
  results[1] = int(R) - int(oldR) [38-23, = 15]

Fourth basis:
  oldR = R [37.6]
  R += (0.2 / 1.0 * 47) [+ 9.4, = 47 total]
  results[1] = int(R) - int(oldR) [47-38, = 9]

9+14+15+9 = 47

Upvotes: 19

mmmmmm

Reputation: 1020

I think proportional distributions is the answer: http://www.sangakoo.com/en/unit/proportional-distributions-direct-and-inverse

Upvotes: 0

Joseph Kingry

Reputation: 8228

TL;DR algorithm with best (+20%) possible accuracy, 70% slower.

Evaulated algorithms presented in accepted answer here as well as answer to python question of similar nature.

Distribute 1 - based on Amber's algorithm
Distribute 2 - based on John Machin's algorithm
Distribute 3 - see below
Distribute 4 - optimized version of Distribute 3 (eg. removed LINQ, used arrays)

Testing results (10,000 iterations)

Algorithm    | Avg Abs Diff (x lowest) | Time (x lowest)     
------------------------------------------------------------------
Distribute 1 | 0.5282 (1.1992)         | 00:00:00.0906921 (1.0000)
Distribute 2 | 0.4526 (1.0275)         | 00:00:00.0963136 (1.0620)
Distribute 3 | 0.4405 (1.0000)         | 00:00:01.1689239 (12.8889)
Distribute 4 | 0.4405 (1.0000)         | 00:00:00.1548484 (1.7074)

Method 3 present has 19.9% better accuracy, for 70.7% slower execution time as expected.

Distribute 3

Makes best effort to be as accurate as possible in distributing amount.

Distribute weights as normal
Increment weights with highest error until actual distributed amount equals expected amount

Sacrifices speed for accuracy by making more then one pass through the loop.

public static IEnumerable<int> Distribute3(IEnumerable<double> weights, int amount)
{
    var totalWeight = weights.Sum();
    var query = from w in weights
                let fraction = amount * (w / totalWeight)
                let integral = (int)Math.Floor(fraction)
                select Tuple.Create(integral, fraction);

    var result = query.ToList();
    var added = result.Sum(x => x.Item1);

    while (added < amount)
    {
        var maxError = result.Max(x => x.Item2 - x.Item1);
        var index = result.FindIndex(x => (x.Item2 - x.Item1) == maxError);
        result[index] = Tuple.Create(result[index].Item1 + 1, result[index].Item2);
        added += 1;
    }

    return result.Select(x => x.Item1);
}

Distribute 4

public static IEnumerable<int> Distribute4(IEnumerable<double> weights, int amount)
{
    var totalWeight = weights.Sum();
    var length = weights.Count();

    var actual = new double[length];
    var error = new double[length];
    var rounded = new int[length];

    var added = 0;

    var i = 0;
    foreach (var w in weights)
    {
        actual[i] = amount * (w / totalWeight);
        rounded[i] = (int)Math.Floor(actual[i]);
        error[i] = actual[i] - rounded[i];
        added += rounded[i];
        i += 1;
    }

    while (added < amount)
    {
        var maxError = 0.0;
        var maxErrorIndex = -1;
        for(var e = 0; e  < length; ++e)
        {
            if (error[e] > maxError)
            {
                maxError = error[e];
                maxErrorIndex = e;
            }
        }

        rounded[maxErrorIndex] += 1;
        error[maxErrorIndex] -= 1;

        added += 1;
    }

    return rounded;
}

Test Harness

static void Main(string[] args)
{
    Random r = new Random();

    Stopwatch[] time = new[] { new Stopwatch(), new Stopwatch(), new Stopwatch(), new Stopwatch() };

    double[][] results = new[] { new double[Iterations], new double[Iterations], new double[Iterations], new double[Iterations] };

    for (var i = 0; i < Iterations; ++i)
    {
        double[] weights = new double[r.Next(MinimumWeights, MaximumWeights)];
        for (var w = 0; w < weights.Length; ++w)
        {
            weights[w] = (r.NextDouble() * (MaximumWeight - MinimumWeight)) + MinimumWeight;
        }
        var amount = r.Next(MinimumAmount, MaximumAmount);

        var totalWeight = weights.Sum();
        var expected = weights.Select(w => (w / totalWeight) * amount).ToArray();

        Action<int, DistributeDelgate> runTest = (resultIndex, func) =>
            {
                time[resultIndex].Start();
                var result = func(weights, amount).ToArray();
                time[resultIndex].Stop();

                var total = result.Sum();

                if (total != amount)
                    throw new Exception("Invalid total");

                var diff = expected.Zip(result, (e, a) => Math.Abs(e - a)).Sum() / amount;

                results[resultIndex][i] = diff;
            };

        runTest(0, Distribute1);
        runTest(1, Distribute2);
        runTest(2, Distribute3);
        runTest(3, Distribute4);
    }
}

Upvotes: 13

Charles

Reputation: 11499

This is an apportionment problem, for which there are many known methods. All have certain pathologies: the Alabama paradox, the population paradox, or a failure of the quota rule. (Balinski and Young proved that no method can avoid all three.) You'll probably want one that follows the quote rule and avoids the Alabama paradox; the population paradox isn't as much of a concern since there's no much difference in the number of days per month between different years.

Upvotes: 1

Jay Stevens

Reputation: 5913

Ok. I'm pretty certain that the original algorithm (as written) and the code posted (as written) doesn't quite answer the mail for the test case outlined by @Mathias.

My intended use of this algorithm is a slightly more specific application. Rather than calculating the % using (@amt / @SumAmt) as shown in the original question. I have a fixed $ amount that needs to be split or spread across multiple items based on a % split defined for each of those items. The split % sums to 100%, however, straight multiplication often results in decimals that (when forced to round to whole $) don't add up to the total amount that I'm splitting apart. This is the core of the problem.

I'm fairly certain that the original answer from @Dav doesn't work in cases where (as @Mathias described) the rounded values are equal across multiple slices. This problem with the original algorithm and code can be summed up with one test case:

Take $100 and split it 3 ways using 33.333333% as your percentage.

Using the code posted by @jtw (assuming this is an accurate implementation of the original algorithm), yields you the incorrect answer of allocating $33 to each item (resulting in an overall sum of $99), so it fails the test.

I think a more accurate algorithm might be:

Have a running total which starts at 0
For each item in the group:
Calculate the un-rounded allocation amount as ( [Amount to be Split] * [% to Split] )
Calculate the cumulative Remainder as [Remainder] + ( [UnRounded Amount] - [Rounded Amount] )
If Round( [Remainder], 0 ) > 1 OR the current item is the LAST ITEM in the list, then set the item's allocation = [Rounded Amount] + Round( [Remainder], 0 )
else set item's allocation = [Rounded Amount]
Repeat for next item

Implemented in T-SQL, it looks like this:

-- Start of Code --
Drop Table #SplitList
Create Table #SplitList ( idno int , pctsplit decimal(5, 4), amt int , roundedAmt int )

-- Test Case #1
--Insert Into #SplitList Values (1, 0.3333, 100, 0)
--Insert Into #SplitList Values (2, 0.3333, 100, 0)
--Insert Into #SplitList Values (3, 0.3333, 100, 0)

-- Test Case #2
--Insert Into #SplitList Values (1, 0.20, 57, 0)
--Insert Into #SplitList Values (2, 0.20, 57, 0)
--Insert Into #SplitList Values (3, 0.20, 57, 0)
--Insert Into #SplitList Values (4, 0.20, 57, 0)
--Insert Into #SplitList Values (5, 0.20, 57, 0)

-- Test Case #3
--Insert Into #SplitList Values (1, 0.43, 10, 0)
--Insert Into #SplitList Values (2, 0.22, 10, 0)
--Insert Into #SplitList Values (3, 0.11, 10, 0)
--Insert Into #SplitList Values (4, 0.24, 10, 0)

-- Test Case #4
Insert Into #SplitList Values (1, 0.50, 75, 0)
Insert Into #SplitList Values (2, 0.50, 75, 0)

Declare @R Float
Declare @Results Float
Declare @unroundedAmt Float
Declare @idno Int
Declare @roundedAmt Int
Declare @amt Float
Declare @pctsplit Float
declare @rowCnt int

Select @R = 0
select @rowCnt = 0

-- Define the cursor 
Declare SplitList Cursor For 
Select idno, pctsplit, amt, roundedAmt From #SplitList Order By amt Desc
-- Open the cursor
Open SplitList

-- Assign the values of the first record
Fetch Next From SplitList Into @idno, @pctsplit, @amt, @roundedAmt
-- Loop through the records
While @@FETCH_STATUS = 0

Begin
    -- Get derived Amounts from cursor
    select @unroundedAmt = ( @amt * @pctsplit )
    select @roundedAmt = Round( @unroundedAmt, 0 )

    -- Remainder
    Select @R = @R + @unroundedAmt - @roundedAmt
    select @rowCnt = @rowCnt + 1

    -- Magic Happens!  (aka Secret Sauce)
    if ( round(@R, 0 ) >= 1 ) or ( @@CURSOR_ROWS = @rowCnt ) Begin
        select @Results = @roundedAmt + round( @R, 0 )
        select @R = @R - round( @R, 0 )
    End
    else Begin
        Select @Results = @roundedAmt
    End

    If Round(@Results, 0) <> 0
    Begin
        Update #SplitList Set roundedAmt = @Results Where idno = @idno
    End

    -- Assign the values of the next record
    Fetch Next From SplitList Into @idno, @pctsplit, @amt, @roundedAmt
End

-- Close the cursor
Close SplitList
Deallocate SplitList

-- Now do the check
Select * From #SplitList
Select Sum(roundedAmt), max( amt ), 
case when max(amt) <> sum(roundedamt) then 'ERROR' else 'OK' end as Test 
From #SplitList

-- End of Code --

Which yields a final result set for the test case of:

idno   pctsplit   amt     roundedAmt
1      0.3333    100     33
2      0.3333    100     34
3      0.3333    100     33

As near as I can tell (and I've got several test cases in the code), this handles all of these situations pretty gracefully.

Upvotes: 2

Mathias

Reputation: 15391

The problem you have is to define what an "acceptable" rounding policy is, or in other words, what it is you are trying to minimize. Consider first this situation: you have only 2 identical items in your list, and are trying to allocate 3 units. Ideally, you would want to allocate the same amount to each item (1.5), but that is clearly not going to happen. The "best" you could do is likely to allocate 1 and 2, or 2 and 1. So

there might be multiple solutions to each allocation
identical items may not receive an identical allocation

Then, I chose 1 and 2 over 0 and 3 because I assume that what you want is to minimize the difference between the perfect allocation, and the integer allocation. This might not be what you consider "a good allocation", and this is a question you need to think about: what would make an allocation better than another one?
One possible value function could be to minimize the "total error", i.e. the sum of the absolute values of the differences between your allocation and the "perfect", unconstrained allocation.
It sounds to me that something inspired by Branch and Bound could work, but it's non trivial.
Assuming that Dav solution always produces an allocation that satisfies the constraint (which I'll trust is the case), I assume that it is not guaranteed to give you the "best" solution, "best" defined by whatever distance/fit metric you end up adopting. My reason for this is that this is a greedy algorithm, which in integer programming problems can lead you to solutions which are really off the optimal solution. But if you can live with a "somewhat correct" allocation, then I say, go for it! Doing it "optimally" doesn't sound trivial.
Best of luck!