darrenbarren
darrenbarren

Reputation: 21

repeated random sampling (sub population)

I want to conduct 400 repeated random sampling (i.e. 400 different sampling outcomes) of a sample size 90. However, the question gets complicated as the total population of 1800 (n_pop) consists of 3 different sub populations (300, 500, 1000) each normally distributed around their own respective standard deviation and mean as in (std_list) & (mean_list).

i.e. sub population of 300 (sub_pop = 300) is normally distributed around std dev 40 and mean of 50 and so on. In addition, the proportions of each sub population in the sample size must be proportionate to that in the total population (n_pop) of which I have already hardcoded as samplesize = [10, 30, 50].

i.e. I want a randomly generated sample size of 10 from sub_pop 300, a sample size of 30 from sub_pop 500 and so on. so what I want to do here is to generate a list to hold the output of the 400 repeated random samples of size 90. this is what I've done thus far:

import numpy as np
n_pop = 1800 #total population (300+500+1000=1800)
obs_size = 90 #sample size
sub_pop = [300, 500, 1000] #sub population
samplesize = [10, 30, 50]  #sub sample size (10+30+50=90)
std_list = [40, 50, 60] #standard deviation
mean_list = [50, 60, 70] #mean

list = []
for i in range(300):
    list += np.random.normal(loc = 50, scale = 40, size = 10).tolist()

for i in range(500):
    list += np.random.normal(loc = 60, scale = 50, size = 30).tolist()

for i in range(1000):
    list += np.random.normal(loc = 70, scale = 60, size = 50).tolist()

I'm unsure as to how to do the above repeatedly 400 times and then add the result into a list.

Upvotes: 0

Views: 749

Answers (1)

Landar
Landar

Reputation: 276

you're almost there with the code:

import numpy as np
n_pop = 1800 #total population (300+500+1000=1800)
obs_size = 90 #sample size
sub_pop = [300, 500, 1000] #sub population
samplesize = [10, 30, 50]  #sub sample size (10+30+50=90)
std_list = [40, 50, 60] #standard deviation
mean_list = [50, 60, 70] #mean

all_samples = []
for _ in range(400):
    list = []
    list += np.random.normal(loc = 50, scale = 40, size = 10).tolist()
    list += np.random.normal(loc = 60, scale = 50, size = 30).tolist()
    list += np.random.normal(loc = 70, scale = 60, size = 50).tolist()
    all_samples.append(list)

you have already described the sub-populations within np.random, so no need to iterate 300 times over that.

Upvotes: 1

Related Questions