Increasing efficiency of a large lottery simulation

Question

I created a python lottery simulator that generates an array of 5 random numbers from 0-75 (the 'winning numbers'):

    for n in xrange(0, 5):
    winning_number.append(randint(0, 75))

It would then check a user-defined number of 'tickets' (the 'count' variable) against the 'winning' array.

while count > 0:
    for n in xrange(0, 5):
        result.append(randint(0, 75))
    if result == winning_number:
        wingame += 1
        count -= 1
        del result[:]
    else:
        count -= 1
        del result[:]

As expected, count values above about 10 million start to take a long time and begin to tax my system resources, but I really want to run much larger values of count. Are there any modifications that I could make to help avoid superfluous steps? Also, would using something like PyCUDA or pyOpenCl be effective and/or within the reach of someone pretty new to coding?

I appreciate any help that you can provide or resources that you could point me to. Thanks!

Hugh Bothwell · Accepted Answer

Your lottery math is kind of odd;

First, random.randint(0, 75) generates values that include both endpoints - so you are picking from 76 values (0 .. 75). Maybe you meant (1, 75)?

Second, you are allowing repeated values, ie 3, 3, 3, 3, 3 is a valid ticket. Lotteries generally do not allow repeats. Take a look at random.sample, ie random.sample(range(1, 76), 5)

Third, the order in which you pick values apparently matters - 1, 2, 3, 4, 5 is different than 1, 3, 2, 4, 5. Lotteries generally do not consider the order (except for possibly a bonus number). In Python terms, you should be comparing sets instead of lists.

Fourth, actually generating lists of values involves a lot of allocating and deallocating memory; you could get the same effect more cheaply by operating on enumerations of states rather than actually generating each state. For example, you could say that {1, 4, 7, 19, 21} is the millionth-and-third combination, then test randvalue == 1000003 instead of randset == {1, 4, 7, 19, 21}.

Implementing these changes, you could simplify your logic like

from random import randrange
from math import factorial

VALUES = 75
PICKS = 5
TICKETS = 10000000

# calculate number of unique tickets
num_combos = factorial(VALUES) // (factorial(VALUES - PICKS) * factorial(PICKS))

winner = randrange(num_combos)
num_winners = sum(randrange(num_combos) == winner for _ in range(TICKETS))

Edit: had a quick thought and tested it;

num_winners = sum(1 for _ in range(TICKETS) if randrange(num_combos) == winner)

is about 5% faster. (More than 85% of runtime is now spent just generating random values.)

Edit2: another thought - if we arbitrarily say "0 is the winning state", then our main loop becomes

num_winners = sum(1 for _ in range(TICKETS) if not randrange(num_combos))

which speeds it up by another 2.5%.

Alternatively you could just go directly to the final solution,

from numpy.random import poisson

num_winners = poisson(TICKETS / num_combos)

Increasing efficiency of a large lottery simulation

Answers (1)

Related Questions