poppyseeds
poppyseeds

Reputation: 451

Extract random values from list that fulfil criteria? Python

Is it possible to use the random module to extract strings from a list, but only if the string has a length greater than x?

For example:

list_of_strings = ['Hello', 'Hello1' 'Hello2']

If you set x = 5 and call random.choice() the code would be 'choosing' between only list_of_strings[1] and list_of_strings[2].

I realise you could make a second list which contains only values of len > x but i would like to know if it is possible without this step.

Upvotes: 2

Views: 98

Answers (3)

Francisco
Francisco

Reputation: 11476

random.choice([s for s in list_of_strings if len(s) > x])

Or you could do something like this:

while True:
    choice = random.choice(list_of_strings)
    if len(choice) > x:
        return choice

You should check first if there are strings in the list that are longer than x, otherwise that code will never end.

Another possible solution is to use reservoir sampling, it has the additional benefit of having a bounded running time.

Upvotes: 4

arekolek
arekolek

Reputation: 9611

Another solution that doesn't create an additional list:

from itertools import islice
from random import randrange

def choose_if(f, s):
  return next(islice(filter(f, s), randrange(sum(map(f, s))), None))

choose_if(lambda x: len(x) > 5, list_of_strings)

Turns out it is almost two times slower than Christian's solution. That's because it iterates over s twice, applying f to every element. It is expensive enough to outweigh the gain from not creating a second list.

Francisco's solution, on the other hand, can be 10 to 100 times faster than that, because it applies f only as many times as it failed to pick a suitable element. Here's a complete version of that function:

from random import choice

def choose_if(f, s):
  if any(filter(f, s)):
    while True:
      x = choice(s)
      if f(x): return x

Bear in mind it starts to get worse when few (less than 1%) elements satisfy the condition. When only 1 element in 5000 was good, it was 5 times slower than using a list comprehension.

Upvotes: 1

Christian
Christian

Reputation: 729

You could do this:

random.choice([i for i in list_of_strings if len(i) > x])

Upvotes: 0

Related Questions