Reputation: 135

Inefficient Random Dice Roll in Python/Jython

As I learn Python (Specifically Jython, if the difference is important here) I'm writing a simple terminal game that uses skills and dice rolls based on the level of those skills to determine success/fail at an attempted action. I hope to use this code eventually in a larger game project.

Under a stress test, the code uses .5GB of ram and seems to take quite a while to get a result (~50 seconds). It could just be that the task is really that intensive but as a noob I'm betting I'm just doing things inefficiently. Could anyone give some tips on both:

how to improve the efficiency of this code

and how to write this code in a more pythonic way?

import random

def DiceRoll(maxNum=100,dice=2,minNum=0):
  return sum(random.randint(minNum,maxNum) for i in xrange(dice))

def RollSuccess(max):
  x = DiceRoll()
  if(x <= (max/10)):
    return 2
  elif(x <= max):
    return 1
  elif(x >= 100-(100-max)/10):
    return -1
  return 0

def RollTesting(skill=50,rolls=10000000):
  cfail = 0
  fail = 0
  success = 0
  csuccess = 0
  for i in range(rolls+1):
    roll = RollSuccess(skill)
    if(roll == -1):
      cfail = cfail + 1
    elif(roll == 0):
      fail = fail + 1
    elif(roll == 1):
      success = success + 1
    else:
      csuccess = csuccess + 1
  print "CFails: %.4f. Fails: %.4f. Successes: %.4f. CSuccesses: %.4f." % (float(cfail)/float(rolls), float(fail)/float(rolls), float(success)/float(rolls), float(csuccess)/float(rolls))

RollTesting()

EDIT - Here's my code now:

from random import random

def DiceRoll():
   return 50 * (random() + random())

def RollSuccess(suclim):
  x = DiceRoll()
  if(x <= (suclim/10)):
    return 2
  elif(x <= suclim):
    return 1
  elif(x >= 90-suclim/10):
    return -1
  return 0

def RollTesting(skill=50,rolls=10000000):
  from time import clock
  start = clock()
  cfail = fail = success = csuccess = 0.0
  for _ in xrange(rolls):
    roll = RollSuccess(skill)
    if(roll == -1):
      cfail += 1
    elif(roll == 0):
      fail += 1
    elif(roll == 1):
      success += 1
    else:
      csuccess += 1
  stop = clock()
  print "Last time this statement was manually updated, DiceRoll and RollSuccess totaled 12 LOC."
  print "It took %.3f seconds to do %d dice rolls and calculate their success." % (stop-start,rolls)
  print "At skill level %d, the distribution is as follows" % (skill)
  print "CFails: %.4f. Fails: %.4f. Successes: %.4f. CSuccesses: %.4f." % (cfail/rolls, fail/rolls, success/rolls, csuccess/rolls)

RollTesting(50)

And the output:

Last time this statement was manually updated, DiceRoll and RollSuccess totaled 12 LOC.
It took 6.558 seconds to do 10000000 dice rolls and calculate their success.
At skill level 50, the distribution is as follows
CFails: 0.0450. Fails: 0.4548. Successes: 0.4952. CSuccesses: 0.0050.

Noticeably this isn't equivalent because I changed the random calculation enough to be noticeably different output (the original was supposed to be 0-100, but I forgot to divide by the amount of dice). The mem usage looks to be ~.2GB now. Also the previous implementation couldn't do 100mil tests, I've ran this one at up to 1bil tests (it took 8 minutes, and the mem usage doesn't seem significantly different).

Upvotes: 1

Answers (4)

abarnert

Reputation: 366013

You're doing 10 million loops. Just the looping costs are probably 10% of your total time. Then, if the whole loop doesn't fit into cache at once, it may slow things down even more.

Is there a way to avoid doing all those loops in Python? Yes, you can do them in Java.

The obvious way to do that is to actually write and call Java code. But you don't have to do that.

A list comprehension, or a generator expression driven by a native builtin, will also do the looping in Java. So, on top of being more compact and simpler, this should also be faster:

attempts = (RollSuccess(skill) for i in xrange(rolls))
counts = collections.Counter(attempts)
cfail, fail, success, csuccess = counts[-1], counts[0], counts[1], counts[2]

Unfortunately, while this does seem to be faster in Jython 2.7b1, it's actually slower in 2.5.2.

Another way to speed up loops is to use a vectorization library. Unfortunately, I don't know what Jython people use for this, but in CPython with numpy, it looks something like this:

def DiceRolls(count, maxNum=100, dice=2, minNum=0):
    return sum(np.random.random_integers(minNum, maxNum, count) for die in range(dice))

def RollTesting(skill=50, rolls=10000000):
    dicerolls = DiceRolls(rolls)
    csuccess = np.count_nonzero(dicerolls <= skill/10)
    success = np.count_nonzero((dicerolls > skill/10) & (dicerolls <= skill))
    fail = np.count_nonzero((dicerolls > skill) & (dicerolls <= 100-(100-skill)/10))
    cfail = np.count_nonzero((dicerolls > 100-(100-skill)/10)

This speeds things up by a factor of about 8.

I suspect that in Jython things aren't nearly as nice as with numpy, and you're expected to import Java libraries like the Apache Commons numerics or PColt and figure out the Java-vs.-Python issues yourself… but better to search and/or ask than to assume.

Finally, you may want to use a different interpreter. CPython 2.5 or 2.7 doesn't seem to be much different from Jython 2.5 here, but it does mean you can use numpy to get an 8x improvement. PyPy 2.0, meanwhile, is 11x faster, with no changes.

Even if you need to do your main program in Jython, if you've got something slow enough to dwarf the cost of starting a new process, you can move it to a separate script that you run via subprocess. For example:

subscript.py:

# ... everything up to the RollTesting's last line
    return csuccess, success, fail, cfail

skill = int(sys.argv[1]) if len(sys.argv) > 1 else 50
rolls = int(sys.argv[2]) if len(sys.argv) > 2 else 10000000
csuccess, success, fail, cfail = RollTesting(skill, rolls)
print csuccess
print success
print fail
print cfail

mainscript.py:

def RollTesting(skill, rolls):
    results = subprocess32.check_output(['pypy', 'subscript.py', 
                                         str(skill), str(rolls)])
    csuccess, success, fail, cfail = (int(line.rstrip()) for line in results.splitlines())
    print "CFails: %.4f. Fails: %.4f. Successes: %.4f. CSuccesses: %.4f." % (float(cfail)/float(rolls), float(fail)/float(rolls), float(success)/float(rolls), float(csuccess)/float(rolls))

(I used the subprocess32 module to get the backport of check_output, which isn't available in Python 2.5, Jython or otherwise. You could also just borrow the source for check_output from 2.7's implementation.)

Note that Jython 2.5.2 has some serious bugs in subprocess (which will be fixed in 2.5.3 and 2.7.0, but that doesn't help you today). But fortunately, they don't affect this code.

In a quick test, the overhead (mostly spawning a new interpreter process, but there's also marshalling the parameters and results, etc.) added more than 10% to the cost, meaning I only got a 9x improvement instead of 11x. And that will be a little worse on Windows. But not enough to negate the benefits for any script that's taking on the order of a minute to run.

Finally, if you're doing more complicated stuff, you can use execnet, which wraps up Jython<->CPython<->PyPy to let you use whatever's best in each part of the code without having to do all that explicit subprocess stuff.

Upvotes: 4

llb

Reputation: 1741

If you're concerned about efficiency, use a profiler. Here's 100,000 rolls:

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scratch
CFails: 0.5522. Fails: 0.3175. Successes: 0.1285. CSuccesses: 0.0019.
         1653219 function calls in 5.433 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.433    5.433 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 cp437.py:18(encode)
   200002    0.806    0.000    2.526    0.000 random.py:165(randrange)
   200002    0.613    0.000    3.139    0.000 random.py:210(randint)
   200002    1.034    0.000    1.720    0.000 random.py:216(_randbelow)
        1    0.181    0.181    5.433    5.433 scratch.py:17(RollTesting)
   100001    0.371    0.000    4.864    0.000 scratch.py:4(DiceRoll)
   300003    0.769    0.000    3.908    0.000 scratch.py:5(<genexpr>)
   100001    0.388    0.000    5.251    0.000 scratch.py:7(RollSuccess)
        2    0.000    0.000    0.000    0.000 {built-in method charmap_encode}
        1    0.000    0.000    5.433    5.433 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {built-in method print}
   100001    0.585    0.000    4.493    0.000 {built-in method sum}
   200002    0.269    0.000    0.269    0.000 {method 'bit_length' of 'int' obje
cts}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Prof
iler' objects}
   253196    0.417    0.000    0.417    0.000 {method 'getrandbits' of '_random.
Random' objects}

Obviously this code is slightly different because I am running in Python 3, but you should be able to see that you spend most of your time in various random.py functions, in summing, and in the generator expression. This is basically where you would expect to spend your time, really, but can we optimize this further?

Currently DiceRoll generates two random numbers and adds them together. This is an approximation of a normal distribution. Why bother actually rolling the dice? 2d100 is a normal distribution with an average of 101 and a standard deviation of 40.82. (Since these dice actually go from 0 to 99, we can take a couple of points off.)

def DiceRoll2():
  return int(random.normalvariate(99, 40.82))

Use the built-in function for the job.

>>> timeit.timeit('scratch.DiceRoll()', 'import scratch')
7.253364044871624

>>> timeit.timeit('scratch.DiceRoll2()', 'import scratch')
1.8604163378306566

Here's the profiler using DiceRoll2 to run 100,000 rolls:

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scratch
CFails: 0.5408. Fails: 0.3404. Successes: 0.1079. CSuccesses: 0.0108.
         710724 function calls in 2.275 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.275    2.275 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 cp437.py:18(encode)
   100001    0.819    0.000    1.393    0.000 random.py:354(normalvariate)
   100001    0.361    0.000    2.094    0.000 scratch.py:10(RollSuccess)
        1    0.180    0.180    2.275    2.275 scratch.py:20(RollTesting)
   100001    0.340    0.000    1.733    0.000 scratch.py:7(DiceRoll2)
        2    0.000    0.000    0.000    0.000 {built-in method charmap_encode}
        1    0.000    0.000    2.275    2.275 {built-in method exec}
   136904    0.203    0.000    0.203    0.000 {built-in method log}
        1    0.000    0.000    0.000    0.000 {built-in method print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   273808    0.371    0.000    0.371    0.000 {method 'random' of '_random.Random' objects}

This halved the time.

If the majority of your die rolls will be one specific type of roll, you should probably just use the random function that generates the specific distribution you'll get for that roll.

Upvotes: 0

JesseTG

Reputation: 2123

All those calls to float() can be cut by defining those local variables in RollTesting() as 0.0. 0 is an int constant, 0.0 is a float constant. If there's even one float involved in any arithmetic operation, another float is returned.

Second, you forgot to change that range() in RollTesting() to xrange().

Third, Python has the usual +=, *=, -=, etc. operators, so fail = fail + 1 becomes fail += 1. Python doesn't have -- or ++, though.

Lastly, you don't need parentheses in if statements.

Upvotes: 1

Silas Ray

Reputation: 26160

Well, one thing, use xrange instead of range. range allocates an array with an element for each of the 10 million digits whereas xrange creates a generator. That's going to help out memory a bunch, and probably speed too.

Upvotes: 2

Inefficient Random Dice Roll in Python/Jython

Answers (4)

Related Questions