Reputation: 23
If I try to execute :
np.empty(shape= (108698,200,1000))
In my jupyter notebook, it throws an error
MemoryError Traceback (most recent call last)
<ipython-input-35-0aedb09803e9> in <module>()
1 import numpy as np
2 #np.empty(shape=(108698-0,200,1000))
----> 3 np.empty(shape= (108698,200,1000))
4 #np.empty(shape=(end-start,n_words,embedding_size))
But when I try to execute
np.empty(shape= (84323,200,1000)),
It executes without any errors.
So is there any way possible to run
np.empty(shape= (108698,200,1000))
without increasing the RAM of my machine?
Upvotes: 0
Views: 4628
Reputation: 101
Well there is not upper limit. We could (roughly) estimate the amount of memory for ndarray:
>>> arr = np.empty(shape= (100,10,1000),dtype='unit8')
>>> hr_size(arr.nbytes)
'1M'
for ndarray with 1 million elements(every element of 'uint8' requires one byte) we need '976.6K' of memory.
for ndarray with shape= (84323,200,1000) and dtype='uint8'
>>> hr_size(84323*200*1000)
'15.7G'
we need more than 15G
and finally for ndarray with shape= (108698,200,1000) and dtype='uint8'
>>> hr_size(108698*200*1000)
'20.2G'
we need more than 20G.
If dtype is 'int64' then estimated amount of memory should be increased eight times.
Upvotes: 0
Reputation: 2094
There is no upper limit defined for shape
, but the whole size of the array is limited to numpy.intp
, which is normally int32
or int64
.
You can either use sparse matrix from SciPi or limit the dtype
of your large (108698,200,1000)
array to int8
, which should work.
Upvotes: 0
Reputation: 23637
You can work with arrays that do not fit into memory by using memory mapped files. Numpy has facilities for this: numpy.memmap
.
E.g:
x = np.memmap('test.bin', mode='w+', shape=(108698,200,1000))
However, on 32 bit Python the files are still limited to 2GB.
Upvotes: 1
Reputation: 2826
No. While it depends on what you're running if you have reached you the maximum allocated memory you can't just create more. For example, if you're running 64-bit numpy, at 8 bytes per entry, that would be 174 GB in all which would take up far too much space. If you know the data entries and are willing to use something besides numpy you could look into sparse arrays. Sparse arrays store only the nonzero elements and their position indices which could potentially save you space.
Upvotes: 2