PasqualeSAL
PasqualeSAL

Reputation: 19

Python MemoryError when use numpy.empty

I have this code,

size_of_similarity_M =80000
similarity_M = numpy.empty((size_of_similarity_M,size_of_similarity_M))

And I am getting this error:

Traceback (most recent call last):
File "<ipython-input-1337-7f9234015aae>", line 1, in <module>
runfile('C:/Users/cp1/PythonScript/Try_replace_function.py', wdir='C:/Users/cp1/PythonScript')
File "C:\Users\cp1\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\cp1\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/cp1/PythonScript/Try_replace_function.py", line 14, in <module>
similarity_M = numpy.empty((size_of_similarity_M,size_of_similarity_M))

MemoryError

How I can solve this in a easily way, without update all script? Of course for a small number in size_of_similarity_M, it works well. I wouldn't change the format of similarity_M, since I use the output of this matrix in other script.

Upvotes: 1

Views: 372

Answers (1)

omri_saadon
omri_saadon

Reputation: 10641

numpy.arrays are meant to live in memory. If you want to work with matrices larger than your RAM, you have to work around that. There are at least two approaches you can follow:

  1. Try a more efficient matrix representation that exploits any special structure that your matrices have. For example, as others have already pointed out, there are efficient data structures for sparse matrices (matrices with lots of zeros), like scipy.sparse.csc_matrix.
  2. Modify your algorithm to work on submatrices. You can read from disk only the matrix blocks that are currently being used in computations. Algorithms designed to run on clusters usually work blockwise, since the data is scatted across different computers, and passed by only when needed.

Upvotes: 1

Related Questions