user30
user30

Reputation: 57

Working with large 2D arrays in python

I have to initialize a large 2D array for some calculations. I am getting "Memory error" when I run the code. The code is as given below

a=np.zeros((200000,200000))  ## I get memory error in this line

for i in range (0,len(rows)):
    for j in range (0,len(rows)):
        if pq[rows[i],cols[j]]>0:
            a[rows[i],cols[j]]=1
        else:
            a[rows[i],cols[j]]=0

Here, 'rows' and 'cols' are 1D arrays of length 200000. The dimension of pq is 433 X 800.

I am using a 64 bit Windows 10 system with Intel® Core™ i7-4770S CPU @ 3.10GHz × 8 Processor with 16 Gb RAM. I am using Python 2.7.12.

Any help to overcome this issue will be appreciated. I am new to python and thank you in advance.

Can this problem be overcome using pyTables or generators? I just read about them online.

Upvotes: 0

Views: 1664

Answers (2)

The problem is that your matrix is really huge. By assuming 1 byte per cell (which is underestimated), your matrix would require 200000*200000 = 40GB to store it completely!

I would suggest you to take a look at Sparse Matrixes. It's a matrix that only stores non-zero values, which in your case would save a lot of space.

Upvotes: 2

Aleksandr Borisov
Aleksandr Borisov

Reputation: 2212

First, you have not mentioned your python architecture. If it's 32 bit, then it has limit of 2 Gb of RAM.

Second, 200000 * 200000 * 1 byte (at least, for a small int) = 37 Gb which is less than your RAM so you cannot allocate it in any way.

Third, your data is sparse, I mean most of your array will be zeros. In this case instead of allocating the array you should store coordinates of your data and (you already have this in pq) and remake your algorithm to work with this data representation.

Upvotes: 3

Related Questions