Reputation: 291

Find propotional sampling using python

I'm given a problem that explicitly asks me not to use numpy and pandas

Prob : Selecting an element from the list A randomly with probability proportional to its magnitude. assume we are doing the same experiment for 100 times with replacement, in each experiment you will print a number that is selected randomly from A.

Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)

Initially, I took the sum of all the elements of list A

I then divided (in order to normaliz) each element of list A by the sum and stored each of these values in another list (d_dash)

I then created another empty list (d_bar), that takes in cumalative sum of all elements of d_dash

created variable r, where r= random.uniform(0.0,1.0), and then for the length of d_dash comapring r to d_dash[k], if r<=d_dash[k], return A[k]

However, I'm getting the error list index out of range near d_dash[j].append((A[j]/sum)), not sure what is the issue here as I did not exceed the index of either d_dash or A[j].

Also, is my logic correct ? sharing a better way to do this would be appreciated.

Thanks in advance.

import random

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A):
    sum=0
    for i in range(len(A)):
        sum = sum + A[i]

    d_dash=[]

    for j in range(len(A)):
        d_dash[j].append((A[j]/sum))

    #cumulative sum

    d_bar =[]
    d_bar[0]= 0

    for k in range(len(A)):
        d_bar[k] = d_bar[k] + d_dash[k]

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(d_bar)):
        if(r<=d_bar[p]):
            number=d_bar[p]
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = propotional_sampling(A)
        print(number)

sampling_based_on_magnitued()

Upvotes: 2

Answers (3)

Hardik Vagadia

Reputation: 375

Below is the code to do the same :

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

#Sum of all the elements in the array
S = sum(A)

#Calculating normalized sum
norm_sum = [ele/S for ele in A]

#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
   cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])

def prop_sampling(cum_norm_sum) :
    """
    This function returns an element
    with proportional sampling.
    """
    r = random.random()
    for itr in range(len(cum_norm_sum)) :
       if r <  cum_norm_sum[itr] :
           return A[itr]

#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
   sampled_elements.append(prop_sampling(cum_norm_sum))

Below image shows the frequency of each element in the sampled points :

Clearly the number of times each elements appears is proportional to its magnitude.

Upvotes: 1

Yellow_truffle

Reputation: 923

The reason you got "list index out of range" message is that you created an empty list "d_bar =[]" and the started assigning value to it "d_bar[k] = d_bar[k] + d_dash[k]". I recoomment using the followoing structor isntead: First, define it in this way:

d_bar=[0 for i in range(len(A))]

Also, I believe this code will return 1 forever as there is no break in the loop. you can resolve this issue by adding "break". here is updated version of your code:

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

def pick_a_number_from_list(A):
    sum=0
    for i in A:
        sum+=i
    A_norm=[]
    for j in A:
        A_norm.append(j/sum)
    A_cum=[0 for i in range(len(A))]
    A_cum[0]=A_norm[0]
    for k in range(len(A_norm)-1):
        A_cum[k+1]=A_cum[k]+A_norm[k+1]
    A_cum

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(A_cum)):
            if(r<=A_cum[p]):
                number=A[p]
                break
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = pick_a_number_from_list(A)
        print(number)

sampling_based_on_magnitued()

Upvotes: 0

Andrej Kesely

Reputation: 195408

Cumulative sum can be computed by itertools.accumulate. The loop:

for p in range(len(d_bar)):
    if(r<=d_bar[p]):
        number=d_bar[p]

can be substituted by bisect.bisect() (doc):

import random
from itertools import accumulate
from bisect import bisect

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A, n=100):
    # calculate cumulative sum from A:
    cum_sum = [*accumulate(A)]
    # cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]

    out = []
    for _ in range(n):
        i = random.random()                     # i = [0.0, 1.0)
        idx = bisect(cum_sum, i*cum_sum[-1])    # get index to list A
        out.append(A[idx])

    return out

print(propotional_sampling(A))

Prints (for example):

[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]

Upvotes: 0

Find propotional sampling using python

Answers (3)

Related Questions