FredMaster
FredMaster

Reputation: 1469

Draw specified number of sample from list. Use all list elements

I have a list of elements. Now I want to specify the number of draws/samples I take from this list. However, I must ensure that

(i) all samples together include all original elements

(ii) the sample sizes should not be the same for each sample

One update to my original question

UPDATE (iii) the minimum sample size is 2

Example:

list = [1,2,3,4,5,6,7,8,9,10]
draws = 4
samples = some_function(draws,list)
set(tuple(row) for row in sample) == set(list) # must be true

samples =[[1,2,3],[4,5],[6,7,8],[9,10]] # 4 draws, together include all elements, two different sample sizes, minimum sample size > 2

Question: is there an easy way to do this using e.g. numpy.random?**

Here is one attempt using np.random.permutation and np.random.choice. However, this approach does not always have all list elements in the final samples.

srch_list = list(range(100))
draws = 10
mid = round(len(srch_list)/draws)
n_leafs = range(mid-2,mid+3)

rnd_list = np.random.permutation(srch_list)
leafs = []
for i in range(draws):
    idx = np.random.choice(n_leafs)
    leafs.append(rnd_list[:idx])
    rnd_list = rnd_list[idx:]


Upvotes: 2

Views: 630

Answers (3)

FredMaster
FredMaster

Reputation: 1469

Based on the first answer (by FBruzzesi) I came up with the following solution:

def _sample_leaf_combinations(l:list,draws=10, minchunk=2):

    ldraw = list(range(minchunk,len(l)-1)[::minchunk])[:-1] # list to draw indices from. Note: deletes some items in order to ensure that distance between indices is at least minchunk
    if len(ldraw) <= draws -1:
        raise ValueError(f"Cannot make {draws} draws from list of {len(l)} with minchunk of {minchunk}. Consider lowering minchunk")


    ids = np.concatenate(([0],np.random.choice(ldraw, draws-1, replace=False),[len(l)]))
    ids = np.sort(ids)
    chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

    return chunks

Thanks for your help!

Upvotes: 0

mapf
mapf

Reputation: 2088

Here is another solution:

import numpy as np


def draw_samples(pool, nsamples, min_sample_size=1):
    # make sure pool is an array for the logic to work
    if not isinstance(pool, np.ndarray):
        pool = np.array(pool)

    # fist determine the total amount of samples to be drawn from pool
    min_total_n_elements = len(pool) if len(pool) > nsamples*min_sample_size \
        else nsamples*min_sample_size
    max_total_n_elements = min_total_n_elements + 5  # the sky is the limit
    total_n_elements = np.random.randint(
        min_total_n_elements, max_total_n_elements
    )
    additional_n_elements = total_n_elements - min_total_n_elements

    # then extend the pool the samples are going to be drawn from
    extended_pool = np.append(
        pool, np.random.choice(pool, size=additional_n_elements)
    ) if additional_n_elements else pool

    # assign each element in the pool to a sample
    assignment = np.array(list(np.arange(nsamples))*min_sample_size)
    if total_n_elements - len(assignment):
        assignment = np.append(
            assignment, np.random.choice(
                np.arange(nsamples), size=total_n_elements - len(assignment)
            )
        )
    np.random.shuffle(assignment)
    samples = [extended_pool[assignment == i] for i in range(nsamples)]

    return samples


lst = np.arange(10)
n_subsamples = 4
samples = draw_samples(lst, n_subsamples, min_sample_size=2)
print(set.union(*[set(sample) for sample in samples]) == set(lst))

Upvotes: 0

FBruzzesi
FBruzzesi

Reputation: 6505

One way of doing it:

import numpy as np

np.random.seed(1)

l = [1,2,3,4,5,6,7,8,9,10]

ids = np.concatenate(([0],
                     np.random.choice(range(1, len(l)-1), 3, replace=False),
                     [len(l)]))

ids = np.sort(ids)

chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

chunks
[[1, 2], [3], [4, 5, 6, 7, 8], [9, 10]]

Now if you also need to shuffle elements of the list you can use numpy.random.shuffle:

np.random.shuffle(l)
chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

chunks
[[5, 9], [3], [10, 1, 6, 8, 7], [2, 4]]

Upvotes: 1

Related Questions