Thomas Johnson
Thomas Johnson

Reputation: 11648

More efficient shuffling during repeated iterations

I need to iterate over a list an arbitrary number of times, yielding each element of the list in random order (a different order each time I iterate through the full list). I need to yield each element once before yielding that element a second time, yield each element twice before yielding that element a third time, etc.

Currently, my code looks like this:

def random_yield(data):
  random.shuffle(data)
  data_index = 0
  while True:
    yield data[data_index]
    data_index += 1

    if data_index == len(data):
      random.shuffle(data)
      data_index = 0

Is there a way to do this more efficiently so I don't pay the performance penalty of the random.shuffle() after every len(data) yields?

Upvotes: 0

Views: 165

Answers (1)

rici
rici

Reputation: 241691

You can do a Fisher-Yates shuffle one step per iteration, thereby distributing the cost evenly over each iteration. That's not more efficient -- in fact, it is probably less efficient, given that the library function is probably faster than Python code -- but it avoids long pauses.

The code is not significantly different from just grabbing a random element each time. The only difference is that you grab the random elements from a subset of the vector:

from random import randrange
def random_yield(data):
  index = 0
  limit = len(data)
  while True:
    if index + 1 >= limit:
      yield data[index]
      index = 0
    else:
      # Get a random element which we haven't yet used this cycle
      # (This is a single iteration of the F-Y shuffle algorithm)
      j = randrange(index, limit)
      rv = data[j]
      yield rv
      # Swap the element we just selected so its not in the next subrange
      data[j] = data[index]
      data[index] = rv
      index += 1

Upvotes: 3

Related Questions