Angelika
Angelika

Reputation: 47

Memory error in python with numpy.pad function

I read a csv file in python and create an 4664605 x 4 array. I want a matrix. So, I use the numpy.pad (wit constant value = 0) function in order to create 4664605 x 4664605 matrix. But I have the following error messaage:

Traceback (most recent call last): File "C:\Users\Angelika\Eclipse\Projects\vonNeumann\vonNeumann.py", line 7, in A_new = np.pad(A, ((0,0),(0,4664601)), 'constant',constant_values=(0)) File "C:\Anaconda\lib\site-packages\numpy\lib\arraypad.py", line 1394, in pad newmat = _append_const(newmat, pad_after, after_val, axis) File "C:\Anaconda\lib\site-packages\numpy\lib\arraypad.py", line 138, in _append_const return np.concatenate((arr, np.zeros(padshape, dtype=arr.dtype)), MemoryError

I have checked the maximum size of my system in case of overflowing but it is ok. More specifically, sys.maxsize = 9223372036854775807 and matrix size = 21758539806025. The issue is that when append rows everything is ok. That is, the result is a 9329210 x 4 array. But I can't add 4664601 columns in order to have a matrix. I don't know what to do.

Thank you very much, Angelika

Upvotes: 1

Views: 1383

Answers (1)

hpaulj
hpaulj

Reputation: 231540

This is more of a question than an answer. But it's too long for comment lines.

The distinction between a 4664605 x 4 array and 4664605 x 4664605 matrix doesn't make much sense. Squareness does not define a matrix, at least not in most contexts.

What's the purpose of adding many 0 filled columns to this array? Even if you had memory to create one that big, would you have enough memory to hold several copies (as needed for math and many other operations)?

The error line:

return np.concatenate((arr, np.zeros(padshape, dtype=arr.dtype))

arr must be (4664605,4) in shape, and padshape (4664605, 466401). So it is trying make an zeros array of padshape size, and then make a new array of the final size. So simply constructing this requires space for 2 very larger arrays.

You might save a bit of space by doing the pad directly

res = np.zeros((4664605, 4664605), dtype=arg.dtype)
res[:,:4] = arr

But still - why make such a big array that is mostly zero?

Upvotes: 2

Related Questions