Reputation: 71
I have a list:
a = [1,2,1,1,3,5,6,2]
I want to select, say 3 elements at random from this list, but they must all be different.
I need to preserve the 'weight' of each element, so sampling from set(a) is not possible.
So far, my solution is:
while condition == False:
mysample = random.sample(a, 3)
if len(set(mysample)) - len(mysample) !=0:
condition = False
else:
condition = True
But this forces me to re-sample as many times as it takes for the elements to all be different. This works fine for small sampling, but for large sampling, my code becomes very inefficient...
Upvotes: 2
Views: 113
Reputation: 79461
This is probably more complicated than necessary, but here is an implementation that uses a modified version of reservoir sampling.
import itertools
import random
def element_at(iterable, index, default=None):
return next(itertools.islice(iterable, index, None), default)
def sample_unique(iterable, size):
S = set()
for index, item in enumerate(iterable):
if len(S) < size:
S.add(item)
else:
r = random.randint(0, index)
if r < size:
other = element_at(S, r)
if item not in S:
S.remove(other)
S.add(item)
return S
Upvotes: 0
Reputation: 180411
l = []
seen = set()
while len(l) < 3:
ch = choice(a)
if ch not in seen:
l.append(ch)
seen.add(ch)
print(l)
Depending on the ratio of actual different numbers to elements different approaches will have different advantages:
In [7]: a = [choice(range(10000)) for _ in range(100000)]
In [6]: import random
In [7]: a = [choice(range(10000)) for _ in range(100000)]
In [8]: %%timeit
random.shuffle(a)
three_elements = set()
for v in a:
if len(three_elements) == 5000:
break
if not v in three_elements:
three_elements.add(v)
...:
10 loops, best of 3: 36.5 ms per loop
In [9]: %%timeit
l = []
seen = set()
while len(l) < 5000:
ch = choice(a)
if ch not in seen:
l.append(ch)
seen.add(ch)
...:
100 loops, best of 3: 5.16 ms per loop
Running your code after 10 mins I had to exit the process as it was still going so whatever you choose will be a major improvement.
Using shuffle would be more efficient if you had a greater ratio of repeats to actual items in the list and you wanted a very large sample size, otherwise the cost of shuffling will make it less efficient than simply using a set and choice,
Upvotes: 1
Reputation: 2486
You can shuffle and take the first three non repeated elements:
import random
random.shuffle(your_list)
three_elements = set()
for v in your_list:
if len(three_elements) == 3: break
three_elements.add(v)
Upvotes: 3
Reputation: 2130
while count < sampleSize: # where sampeSize is the number of values you want
s = random.sample(a, 1)
filter(lambda x: x != s, a)
mysample.append(s)
count += 1
Upvotes: 0