Maruf
Maruf

Reputation: 790

python random sampling based on a distribution

Before going to the topic, let's first take a look on the python's default sampling method,

>>> import random
>>> c=[1,2,3,100,101,102,103,104,105,106,109,110,111,112,113,114]
>>> random.sample(c,1)
[103]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[2]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[2]
>>> random.sample(c,1)
[106]
>>> random.sample(c,1)
[3]
>>> random.sample(c,1)
[105]
>>> random.sample(c,1)
[110]
>>> random.sample(c,1)
[103]
>>> random.sample(c,1)

From the source code we can easily see what it actually does (below is the major portion of the code from the link),

selected = set()
selected_add = selected.add
for i in xrange(k):
    j = _int(random() * n)
    while j in selected:
        j = _int(random() * n)
        selected_add(j)
        result[i] = population[j]

This sampling method has randomly chosen an index. In case of that, there is a chance that a very non-likely population member got selected. Say for example 1 in the above example.

But let's concentrate on a more realistic scenario. Let's assume you have 16 number which represents the frequency of some label from 0-15.

freq array = [1, 2, 3, 100, 100, 100, 102, 102, 102, 100, 99, 50, 20, 1, 2, 3]

index of each position represents the label type. Like from the above list we can say that the total number of population on label 0 is 1, the total number of population on label 3 is 100, the total number of population of label 2 is 3 etc.

now if you want to select 5 members from the population, can we generate a new list which tells that I should take X number of members from label Y based on some distribution. (For the time being, let's assume normal distribution),

A sample: (maybe not the answer)

new_array = [0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

it means we should take 1 member from label 4-7.

So maybe the question is well ask in the following manner,

How to sample members from a population based on some Normal distribution and population frquency. (For the time being, let's strict it to Normal Distribution)

I searched for functions in both python.random and np.random library but could not get anything useful. Your idea or suggestion is highly appreciated and if possible code also.

Upvotes: 0

Views: 7827

Answers (1)

Angel Panizo
Angel Panizo

Reputation: 91

using numpy you have numpy.random.normal (https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html) that allows you to generate numbers from a normal distribution.

for example to generate 100 random number from a normal distribution with mean 5.0 and standard deviation 1.0 you use:

numpy.random.normal(loc=5.0,scale=1.0,size=100)

A lot of other distributions are available here the list:

https://docs.scipy.org/doc/numpy/reference/routines.random.html

Upvotes: 6

Related Questions