maxischl
maxischl

Reputation: 621

A weighted version of random.randint

I would like to choose a random integer between a and b (both included), with the statistical weight of c.

c is a value between a and b.

Which is the most efficient way to apply the weight factor c to random.randint?

The closest I got was this question, but there is a big difference:

I have only one single statistical weight c, not a statistical probability for every value between a and b.

Example:

a = 890
b = 3200

c = 2600

print(random.randint(a,b))

>>>> supposed to result most frequently in a value around 2600

I don't really care about the distribution between a and b, as long as there is a weight on c. However, a Gaussian distribution would be appreciated.

Please note: this question does not not address the numpy.random module as in this question.

Upvotes: 2

Views: 2594

Answers (2)

pjs
pjs

Reputation: 19855

Sounds like the triangular distribution might fit your needs. The values a and b are the min and max, respectively, and c corresponds to the mode (most likely outcome) of the distribution.

There is a triangular generator in numpy.random. It generates floats, but you can round and then integerize the results. If you're being picky, this would be slightly biased away from the min and max, which would only have half the range and thus half the expected count compared to the other integer values. Statisticians adjust for this range conversion from reals to ints using a continuity correction: subtract 1/2 from the min, add 1/2 to the max. This is most likely to be relevant if you're dealing with small ranges, as in the tiny example below.

import numpy as np
import matplotlib.pyplot as plt

# replace with your actual values
a = 1
b = 5
c = 2

# Without continuity correction
plt.hist(np.ma.round(np.random.triangular(
          left = a,
          mode = c,
          right = b, 
          size = 100000)
        ).astype(int),
        range = (0.5, 5.5), bins = 50, density = True)
plt.show()

# With continuity correction
plt.hist(np.ma.round(np.random.triangular(
          left = a - 0.5,
          mode = c,
          right = b + 0.5, 
          size = 100000)
        ).astype(int),
        range = (0.5, 5.5), bins = 50, density = True)
plt.show()

Here are the results with your actual parameterization:

# Actual target case
a = 890
b = 3200
c = 2600
plt.hist(np.ma.round(np.random.triangular(
          left = a - 0.5,
          mode = c,
          right = b + 0.5, 
          size = 100000)
        ).astype(int),
        range = (500, 3500), bins = 300, density = True)
plt.show()

Histogram of generated triangular distribution

Note that unlike the normal distribution suggested in comments, this is guaranteed to stay within the range (a, b).

Upvotes: 3

Patrick Artner
Patrick Artner

Reputation: 51683

You use

random.choices(range(a,b+1), weights= [....], k=1)  # or cum_weights

for a k of 1 and a population in range(a,b+1) and the weights you want.

See: https://docs.python.org/3/library/random.html#random.choices


You would have to calculate a possible (arbritrary) weighting, f.e.:

import random
from collections import defaultdict
a = 8
b = 32

c = 26

# hacked distribution
w = [(i-a)**2 if i <= c else (b-i+a)**2 for i in range(a,b+1)]

d=defaultdict(int)
for i in range(a,b+1):
    d[i]=0

# test for 10k numbers
for num in random.choices(range(a,b+1), weights = w, k=10000):
    d[num] += 1

print(w)
print(d)

It is still random, one run got me:

# hacked distribution
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 
  256, 289, 196, 169, 144, 121, 100, 81, 64]

# test for 10k numbers
{8: 0, 9: 8, 10: 7, 11: 37, 12: 61, 13: 94, 14: 149, 15: 175, 16: 229, 
 17: 283, 18: 374, 19: 450, 20: 493, 21: 628, 22: 672, 23: 820, 24: 907, 
 25: 1038, 26: 1183, 27: 564, 28: 537, 29: 435, 30: 325, 31: 293, 32: 238}

Upvotes: 2

Related Questions