Bastien
Bastien

Reputation: 382

Pick subset from list at random and maintain equal number of picks overall in python

given a list of strings like so (in reality I have a much longer list but I'll keep it short for here):

items=['fish','headphones','wineglass','bowtie','cheese','hammer','socks']

I would like to pick a subset, say 3, of this list randomly so that items can only get picked once. This is easy enough using the following:

import itertools
import random
def random_combination(iterable, r):
    "Random selection from itertools.combinations(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.sample(xrange(n), r))
    return tuple(pool[i] for i in indices)

items=['fish','headphones','wineglass','bowtie','cheese','hammer','socks']
randomPick=random_combination(items,3)

Next, to be a pain, I don't want to do this just once, but several times say 10 times. The final product would be 10 lists of randomly-only-picked-once items with the constraint that over those 10 lists items are presented an equal amount of times across lists. I'd like to avoid the "socks" to be picked up 10 times and the "hammer" only once for example.

This is the step that I'm stuck with, I simply don't know enough programming or enough about the available functions in python to perform such a thing.

Can anyone help?

Upvotes: 0

Views: 611

Answers (3)

Bastien
Bastien

Reputation: 382

Ok in the end I resorted in doing the following. It is a more constrained implementation where I set the number of times I want to see an item repeat, for example over the 10 lists I want each items to be picked 5 times:

List = ['airplane',
            'fish',
            'watch',
            'balloon',
            'headphones',
            'wineglass',
            'bowtie',
            'guitar',
            'desk',
            'bottle',
            'glove'] #there is more in my final list but keeping it short here
numIters = 5
numItems = len(List)
finalList=[]
for curList in range(numIters):
    random.shuffle(List)
    finalList.append(List[0 : numItems/2]) #append first list
    finalList.append(List[numItems/2 : -1]) #append second list

return finalList

Upvotes: 0

acushner
acushner

Reputation: 9946

i would do something like this:

items = set(items)
res = []
for _ in xrange(10):
    r = random.sample(items, 3)
    res.append(r)
    items -= set(r)

all this does is grab 3 elements, store them, and then subtract them from the original list so they can't be selected again.

Upvotes: 0

Narpar1217
Narpar1217

Reputation: 627

The following code might help. It pops a random element until (a copy of) iterable is empty, then starts over from the entire list. The downside is every item is picked once before a single item can be picked a second time. However, as you can see from the output, the distribution of items ends up about equal.

import random

def equal_distribution_combinations(iterable, n, csize):
    """
    Yield 'n' lists of size 'csize' containing distinct random elements
    from 'iterable.' Elements of 'iterable' are approximately evenly
    distributed across all yielded combinations.
    """
    i_copy = list(iterable)

    if csize > len(i_copy):
        raise ValueError(
            "csize cannot exceed len(iterable), as elements could not distinct."
        )

    for i in range(n):
        comb = []
        for j in range(csize):
            if not i_copy:
                i_copy = list(iterable)

            randi = random.randint(0, len(i_copy) - 1)

            # If i_coppy was reinstantiated it would be possible to have
            # duplicate elements in comb without this check.
            while i_copy[randi] in comb:
                randi = random.randint(0, len(i_copy) - 1)
            comb.append(i_copy.pop(randi))

        yield comb

Edit

Apologies for Python 3. The only change to the function for Python 2 should be range -> xrange.

Edit 2 (answering comment question)

equal_distribution_combinations should result in an even distribution for any n, csize, and length of iterable, as long as csize does not exceed len(iterable) (as the combination elements could not be distinct).

Here's a test using the specific numbers in your comment:

items = range(30)
item_counts = {k: 0 for k in items}

for comb in equal_distribution_combinations(items, 10, 10):
    print(comb)
    for e in comb:
        item_counts[e] += 1

print('')
for k, v in item_counts.items():
    print('Item: {0}  Count: {1}'.format(k, v))

Output:

[19, 28, 3, 20, 2, 9, 0, 25, 27, 12]
[29, 5, 22, 10, 1, 8, 17, 21, 14, 4]
[16, 13, 26, 6, 23, 11, 15, 18, 7, 24]
[26, 14, 18, 20, 16, 0, 1, 11, 10, 2]
[27, 21, 28, 24, 25, 12, 13, 19, 22, 6]
[23, 3, 8, 4, 15, 5, 29, 9, 7, 17]
[11, 1, 8, 28, 3, 13, 7, 26, 16, 23]
[9, 29, 14, 15, 17, 21, 18, 24, 12, 10]
[19, 20, 0, 2, 25, 5, 22, 4, 27, 6]
[12, 13, 24, 28, 6, 7, 26, 17, 25, 23]

Item: 0  Count: 3
Item: 1  Count: 3
Item: 2  Count: 3
Item: 3  Count: 3
Item: 4  Count: 3
Item: 5  Count: 3
Item: 6  Count: 4
Item: 7  Count: 4
Item: 8  Count: 3
Item: 9  Count: 3
Item: 10  Count: 3
Item: 11  Count: 3
Item: 12  Count: 4
Item: 13  Count: 4
Item: 14  Count: 3
Item: 15  Count: 3
Item: 16  Count: 3
Item: 17  Count: 4
Item: 18  Count: 3
Item: 19  Count: 3
Item: 20  Count: 3
Item: 21  Count: 3
Item: 22  Count: 3
Item: 23  Count: 4
Item: 24  Count: 4
Item: 25  Count: 4
Item: 26  Count: 4
Item: 27  Count: 3
Item: 28  Count: 4
Item: 29  Count: 3

As can be seen, the items are evenly distributed.

Upvotes: 1

Related Questions