Reputation: 1469
I have a list of elements. Now I want to specify the number of draws/samples I take from this list. However, I must ensure that
(i) all samples together include all original elements
(ii) the sample sizes should not be the same for each sample
One update to my original question
UPDATE (iii) the minimum sample size is 2
Example:
list = [1,2,3,4,5,6,7,8,9,10]
draws = 4
samples = some_function(draws,list)
set(tuple(row) for row in sample) == set(list) # must be true
samples =[[1,2,3],[4,5],[6,7,8],[9,10]]
# 4 draws, together include all elements, two different sample sizes, minimum sample size > 2
Question: is there an easy way to do this using e.g. numpy.random
?**
Here is one attempt using np.random.permutation
and np.random.choice
. However, this approach does not always have all list elements in the final samples.
srch_list = list(range(100))
draws = 10
mid = round(len(srch_list)/draws)
n_leafs = range(mid-2,mid+3)
rnd_list = np.random.permutation(srch_list)
leafs = []
for i in range(draws):
idx = np.random.choice(n_leafs)
leafs.append(rnd_list[:idx])
rnd_list = rnd_list[idx:]
Upvotes: 2
Views: 630
Reputation: 1469
Based on the first answer (by FBruzzesi
) I came up with the following solution:
def _sample_leaf_combinations(l:list,draws=10, minchunk=2):
ldraw = list(range(minchunk,len(l)-1)[::minchunk])[:-1] # list to draw indices from. Note: deletes some items in order to ensure that distance between indices is at least minchunk
if len(ldraw) <= draws -1:
raise ValueError(f"Cannot make {draws} draws from list of {len(l)} with minchunk of {minchunk}. Consider lowering minchunk")
ids = np.concatenate(([0],np.random.choice(ldraw, draws-1, replace=False),[len(l)]))
ids = np.sort(ids)
chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]
return chunks
Thanks for your help!
Upvotes: 0
Reputation: 2088
Here is another solution:
import numpy as np
def draw_samples(pool, nsamples, min_sample_size=1):
# make sure pool is an array for the logic to work
if not isinstance(pool, np.ndarray):
pool = np.array(pool)
# fist determine the total amount of samples to be drawn from pool
min_total_n_elements = len(pool) if len(pool) > nsamples*min_sample_size \
else nsamples*min_sample_size
max_total_n_elements = min_total_n_elements + 5 # the sky is the limit
total_n_elements = np.random.randint(
min_total_n_elements, max_total_n_elements
)
additional_n_elements = total_n_elements - min_total_n_elements
# then extend the pool the samples are going to be drawn from
extended_pool = np.append(
pool, np.random.choice(pool, size=additional_n_elements)
) if additional_n_elements else pool
# assign each element in the pool to a sample
assignment = np.array(list(np.arange(nsamples))*min_sample_size)
if total_n_elements - len(assignment):
assignment = np.append(
assignment, np.random.choice(
np.arange(nsamples), size=total_n_elements - len(assignment)
)
)
np.random.shuffle(assignment)
samples = [extended_pool[assignment == i] for i in range(nsamples)]
return samples
lst = np.arange(10)
n_subsamples = 4
samples = draw_samples(lst, n_subsamples, min_sample_size=2)
print(set.union(*[set(sample) for sample in samples]) == set(lst))
Upvotes: 0
Reputation: 6505
One way of doing it:
import numpy as np
np.random.seed(1)
l = [1,2,3,4,5,6,7,8,9,10]
ids = np.concatenate(([0],
np.random.choice(range(1, len(l)-1), 3, replace=False),
[len(l)]))
ids = np.sort(ids)
chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]
chunks
[[1, 2], [3], [4, 5, 6, 7, 8], [9, 10]]
Now if you also need to shuffle elements of the list you can use numpy.random.shuffle:
np.random.shuffle(l)
chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]
chunks
[[5, 9], [3], [10, 1, 6, 8, 7], [2, 4]]
Upvotes: 1