Matt Dowle
Matt Dowle

Reputation: 59612

How to solve memory error in mtrand.RandomState.choice?

I'm trying to sample 1e7 items from 1e5 strings but getting a memory error. It's fine sampling 1e6 items from 1e4 strings. I'm on a 64bit machine with 4GB RAM and don't think I should be reaching any memory limit at 1e7. Any ideas?

$ python3
Python 3.3.3 (default, Nov 27 2013, 17:12:35) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> K = 100

Works fine with 1e6 :

>>> N = int(1e6)
>>> np.random.choice(["id%010d"%x for x in range(N//K)], N)
array(['id0000005473', 'id0000005694', 'id0000004115', ..., 'id0000006958',
       'id0000009972', 'id0000003009'], 
      dtype='<U12')

Error with N=1e7 :

>>> N = int(1e7)
>>> np.random.choice(["id%010d"%x for x in range(N//K)], N)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mtrand.pyx", line 1092, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:8229)
MemoryError
>>> 

I found this question but it seems to be about catching an error like this rather than solving it.

Python not catching MemoryError

I'd be happy with either a solution still using random.choice or a different method to do this. Thanks.

Upvotes: 3

Views: 3538

Answers (1)

doctorlove
doctorlove

Reputation: 19262

You can work round this using a generator function:

def item():
    for i in xrange(N):
      yield "id%010d"%np.random.choice(N//K,1)

This avoids needing all the items in memory at once.

Upvotes: 2

Related Questions