Jack S.
Jack S.

Reputation: 2793

Generate random numbers summing to a predefined value

How to generate n pseudo-random numbers, so that the sum of them is equal to a particular value?

So here is the deal: I want to (for example) generate 4 pseudo-random numbers, that when added together would equal 40.

How could this be done in Python? I could generate a random number 1-40, then generate another number between 1 and the remainder, etc, but then the first number would have a greater chance of "grabbing" more.

Upvotes: 46

Views: 22803

Answers (9)

Jonathan H
Jonathan H

Reputation: 7933

This is what the Dirichlet-multinomial distribution does. This is an improvement on sampling from a multinomial distribution in that we don't need to assign a probability for each term to be selected.

For context, the multinomial distribution draws one term among m with set probabilities of being selected (summing to 1), repeats that n times, and then counts the number of times each term was selected, which by construction sums to n. In contrast the Dirichlet-multinomial distribution first samples the probabilities of each term to be selected in a way that is uniform among all possible combinations, and then runs the multinomial selection.

Unfortunately sampling from this distribution doesn't seem to be implemented in SciPy at the moment, so I asked a question about this. In the meantime, the following function will generate m non-negative integers summing up to n in a uniform manner across all possible splits:

import scipy.stats as sps

def random_split(n,m):
    """ Generate m non-negative integers summing to n. """
    p = sps.dirichlet.rvs(alpha=[1]*m, size=1)
    return sps.multinomial.rvs(n=n, p=p[0], size=1)

If we want to further impose that the integers be positive, we can subtract m from the sum-total and manually add 1 to all terms:

import scipy.stats as sps

def random_split_positive(n,m):
    """ Generate m positive integers summing to n. """
    p = sps.dirichlet.rvs(alpha=[1]*m, size=1)
    return 1+sps.multinomial.rvs(n=n-m, p=p[0], size=1)

Finally for completeness, if neither the terms nor the sum total are constrained to be integers, the Dirichlet distribution will happily find a split as well:

import scipy.stats as sps

def random_split_float(s,m):
    """ Generate m non-negative numbers summing to s. """
    return s*sps.dirichlet.rvs(alpha=[1]*m, size=1)

Upvotes: 2

charrison
charrison

Reputation: 857

Here's a humble improvement to @Mark Dickinson's version to allow zeros in the generated integers (so that they are non-negative, rather than positive):

import random

def constrained_sum_sample_pos(n, total):
    """Return a randomly chosen list of n non-negative integers summing to total.
    Each such list is equally likely to occur."""

    dividers = sorted(random.choices(range(0, total), k=n-1))
    return [a - b for a, b in zip(dividers + [total], [0] + dividers)]

The function random.choices() samples with replacement, as opposed to random.sample() which samples without replacement. It's new from Python 3.6.

Upvotes: 1

Ruggero Turra
Ruggero Turra

Reputation: 17670

Use multinomial distribution

from numpy.random import multinomial
multinomial(40, [1/4.] * 4)

Each variable will be distributed as a binomial distribution with mean n * p equal to 40 * 1/4 = 10 in this example.

Upvotes: 22

Samir
Samir

Reputation: 29

If you want true randomness then use:

import numpy as np
def randofsum_unbalanced(s, n):
    # Where s = sum (e.g. 40 in your case) and n is the output array length (e.g. 4 in your case)
    r = np.random.rand(n)
    a = np.array(np.round((r/np.sum(r))*s,0),dtype=int)
    while np.sum(a) > s:
        a[np.random.choice(n)] -= 1
    while np.sum(a) < s:
        a[np.random.choice(n)] += 1
    return a

If you want a greater level of uniformity then take advantage of the multinomial distribution:

def randofsum_balanced(s, n):
    return np.random.multinomial(s,np.ones(n)/n,size=1)[0]

Upvotes: 0

John Mee
John Mee

Reputation: 52233

Building on @markdickonson by providing some control over distribution between the divisors. I introduce a variance/jiggle as a percentage of the uniform distance between each.

 def constrained_sum_sample(n, total, variance=50):
    """Return a random-ish list of n positive integers summing to total.

    variance: int; percentage of the gap between the uniform spacing to vary the result.
    """
    divisor = total/n
    jiggle = divisor * variance / 100 / 2
    dividers = [int((x+1)*divisor + random.random()*jiggle) for x in range(n-1)]
    result = [a - b for a, b in zip(dividers + [total], [0] + dividers)]
    return result

Sample output:

[12, 8, 10, 10]
[10, 11, 10, 9]
[11, 9, 11, 9]
[11, 9, 12, 8]

The idea remains to divide the population equally, then randomly move them left or right within the given range. Since each value is still bound to the uniform point we don't have to worry about it drifting.

Good enough for my purposes, but not perfect. eg: the first number will always vary higher, and the last will always vary lower.

Upvotes: -1

Mark Dickinson
Mark Dickinson

Reputation: 30561

Here's the standard solution. It's similar to Laurence Gonsalves' answer, but has two advantages over that answer.

  1. It's uniform: each combination of 4 positive integers adding up to 40 is equally likely to come up with this scheme.

and

  1. it's easy to adapt to other totals (7 numbers adding up to 100, etc.)
import random

def constrained_sum_sample_pos(n, total):
    """Return a randomly chosen list of n positive integers summing to total.
    Each such list is equally likely to occur."""

    dividers = sorted(random.sample(range(1, total), n - 1))
    return [a - b for a, b in zip(dividers + [total], [0] + dividers)]

Sample outputs:

>>> constrained_sum_sample_pos(4, 40)
[4, 4, 25, 7]
>>> constrained_sum_sample_pos(4, 40)
[9, 6, 5, 20]
>>> constrained_sum_sample_pos(4, 40)
[11, 2, 15, 12]
>>> constrained_sum_sample_pos(4, 40)
[24, 8, 3, 5]

Explanation: there's a one-to-one correspondence between (1) 4-tuples (a, b, c, d) of positive integers such that a + b + c + d == 40, and (2) triples of integers (e, f, g) with 0 < e < f < g < 40, and it's easy to produce the latter using random.sample. The correspondence is given by (e, f, g) = (a, a + b, a + b + c) in one direction, and (a, b, c, d) = (e, f - e, g - f, 40 - g) in the reverse direction.

If you want nonnegative integers (i.e., allowing 0) instead of positive ones, then there's an easy transformation: if (a, b, c, d) are nonnegative integers summing to 40 then (a+1, b+1, c+1, d+1) are positive integers summing to 44, and vice versa. Using this idea, we have:

def constrained_sum_sample_nonneg(n, total):
    """Return a randomly chosen list of n nonnegative integers summing to total.
    Each such list is equally likely to occur."""

    return [x - 1 for x in constrained_sum_sample_pos(n, total + n)]

Graphical illustration of constrained_sum_sample_pos(4, 10), thanks to @FM. (Edited slightly.)

0 1 2 3 4 5 6 7 8 9 10  # The universe.
|                    |  # Place fixed dividers at 0, 10.
|   |     |       |  |  # Add 4 - 1 randomly chosen dividers in [1, 9]
  a    b      c    d    # Compute the 4 differences: 2 3 4 1

Upvotes: 123

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143064

b = random.randint(2, 38)
a = random.randint(1, b - 1)
c = random.randint(b + 1, 39)
return [a, b - a, c - b, 40 - c]

(I assume you wanted integers since you said "1-40", but this could be easily generalized for floats.)

Here's how it works:

  • cut the total range in two randomly, that's b. The odd range is because there are going to be at least 2 below the midpoint and at least 2 above. (This comes from your 1 minimum on each value).
  • cut each of those ranges in two randomly. Again, the bounds are to account for the 1 minimum.
  • return the size of each slice. They'll add up to 40.

Upvotes: 13

Jim Lewis
Jim Lewis

Reputation: 45025

There are only 37^4 = 1,874,161 arrangements of four integers in the range [1,37] (with repeats allowed). Enumerate them, saving and counting the permutations that add up to 40. (This will be a much smaller number, N).

Draw uniformly distributed random integers K in the interval [0, N-1] and return the K-th permutation. This can easily be seen to guarantee a uniform distribution over the space of possible outcomes, with each sequence position identically distributed. (Many of the answers I'm seeing will have the final choice biased lower than the first three!)

Upvotes: 2

lalli
lalli

Reputation: 6303

Generate 4 random numbers, compute their sum, divide each one by the sum and multiply by 40.

If you want Integers, then this will require a little non-randomness.

Upvotes: 8

Related Questions