Reputation: 291
I'm given a problem that explicitly asks me not to use numpy and pandas
Prob : Selecting an element from the list A randomly with probability proportional to its magnitude. assume we are doing the same experiment for 100 times with replacement, in each experiment you will print a number that is selected randomly from A.
Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)
Initially, I took the sum of all the elements of list A
I then divided (in order to normaliz) each element of list A by the sum and stored each of these values in another list (d_dash)
I then created another empty list (d_bar), that takes in cumalative sum of all elements of d_dash
created variable r, where r= random.uniform(0.0,1.0), and then for the length of d_dash comapring r to d_dash[k], if r<=d_dash[k], return A[k]
However, I'm getting the error list index out of range
near d_dash[j].append((A[j]/sum)), not sure what is the issue here as I did not exceed the index of either d_dash or A[j].
Also, is my logic correct ? sharing a better way to do this would be appreciated.
Thanks in advance.
import random
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A):
sum=0
for i in range(len(A)):
sum = sum + A[i]
d_dash=[]
for j in range(len(A)):
d_dash[j].append((A[j]/sum))
#cumulative sum
d_bar =[]
d_bar[0]= 0
for k in range(len(A)):
d_bar[k] = d_bar[k] + d_dash[k]
r = random.uniform(0.0,1.0)
number=0
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = propotional_sampling(A)
print(number)
sampling_based_on_magnitued()
Upvotes: 2
Views: 4776
Reputation: 375
Below is the code to do the same :
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
#Sum of all the elements in the array
S = sum(A)
#Calculating normalized sum
norm_sum = [ele/S for ele in A]
#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])
def prop_sampling(cum_norm_sum) :
"""
This function returns an element
with proportional sampling.
"""
r = random.random()
for itr in range(len(cum_norm_sum)) :
if r < cum_norm_sum[itr] :
return A[itr]
#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
sampled_elements.append(prop_sampling(cum_norm_sum))
Below image shows the frequency of each element in the sampled points :
Clearly the number of times each elements appears is proportional to its magnitude.
Upvotes: 1
Reputation: 923
The reason you got "list index out of range" message is that you created an empty list "d_bar =[]" and the started assigning value to it "d_bar[k] = d_bar[k] + d_dash[k]". I recoomment using the followoing structor isntead: First, define it in this way:
d_bar=[0 for i in range(len(A))]
Also, I believe this code will return 1 forever as there is no break in the loop. you can resolve this issue by adding "break". here is updated version of your code:
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
def pick_a_number_from_list(A):
sum=0
for i in A:
sum+=i
A_norm=[]
for j in A:
A_norm.append(j/sum)
A_cum=[0 for i in range(len(A))]
A_cum[0]=A_norm[0]
for k in range(len(A_norm)-1):
A_cum[k+1]=A_cum[k]+A_norm[k+1]
A_cum
r = random.uniform(0.0,1.0)
number=0
for p in range(len(A_cum)):
if(r<=A_cum[p]):
number=A[p]
break
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = pick_a_number_from_list(A)
print(number)
sampling_based_on_magnitued()
Upvotes: 0
Reputation: 195408
Cumulative sum can be computed by itertools.accumulate
. The loop:
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
can be substituted by bisect.bisect()
(doc):
import random
from itertools import accumulate
from bisect import bisect
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A, n=100):
# calculate cumulative sum from A:
cum_sum = [*accumulate(A)]
# cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]
out = []
for _ in range(n):
i = random.random() # i = [0.0, 1.0)
idx = bisect(cum_sum, i*cum_sum[-1]) # get index to list A
out.append(A[idx])
return out
print(propotional_sampling(A))
Prints (for example):
[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]
Upvotes: 0