Reputation: 7666
I need to handle some large numpy
arrays in my project. After such an array is loaded from the disk, over half of my computer's memory will be consumed.
After the array is loaded, I make several slices (almost half of the array will be selected) of it, then I receive error tells me the memory is insufficient.
By doing a little experiment I understand, I receive the error because when a numpy
array is sliced, a copy will be created
import numpy as np
tmp = np.linspace(1, 100, 100)
inds = list(range(100))
tmp_slice = tmp[inds]
assert id(tmp) == id(tmp_slice)
returns AssertionError
Is there a way that a slice of a numpy
array only refers to the memory addresses of the original array thus data entries are not copied?
Upvotes: 1
Views: 3057
Reputation: 231385
In Python slice
is a well defined class, with start
, stop
, step
values. It is used when we index a list with alist[1: 10: 2]
. This makes a new list with copies of the pointers from the original. In numpy
these are used in basic indexing
, e.g. arr[:3, -3:]
. This creates a view
of the original. The view
shares the data buffer, but has its own shape
and strides
.
But when we index arrays with lists, arrays or boolean arrays (mask), it has to make a copy, an array with its own data buffer. The selection of elements is too complex or irregular to express in terms of the shape
and strides
attributes.
In some cases the index array is small (compared to the original) and copy is also small. But if we are permuting the whole array, then the index array, and copy will both be as large as the original.
Upvotes: 3
Reputation: 2551
Reading through this, this, and this I think your problem is in using advanced slicing, and to reiterate one of the answers -- numpy docs clearly state that
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
So instead of doing:
inds = list(range(100))
tmp_slice = tmp[inds]
you should rather use:
tmp_slice = tmp[:100]
This will result in a view rather than a copy. You can notice the difference by trying:
tmp[0] = 5
In the first case tmp_slice[0]
will return 1.0
, but in the second it will return 5
.
Upvotes: 1