meTchaikovsky
meTchaikovsky

Reputation: 7666

Is there a way to not make a copy when a numpy array is sliced?

I need to handle some large numpy arrays in my project. After such an array is loaded from the disk, over half of my computer's memory will be consumed.

After the array is loaded, I make several slices (almost half of the array will be selected) of it, then I receive error tells me the memory is insufficient.

By doing a little experiment I understand, I receive the error because when a numpy array is sliced, a copy will be created

import numpy as np

tmp = np.linspace(1, 100, 100)
inds = list(range(100))
tmp_slice = tmp[inds]

assert id(tmp) == id(tmp_slice)

returns AssertionError

Is there a way that a slice of a numpy array only refers to the memory addresses of the original array thus data entries are not copied?

Upvotes: 1

Views: 3057

Answers (2)

hpaulj
hpaulj

Reputation: 231385

In Python slice is a well defined class, with start, stop, step values. It is used when we index a list with alist[1: 10: 2]. This makes a new list with copies of the pointers from the original. In numpy these are used in basic indexing, e.g. arr[:3, -3:]. This creates a view of the original. The view shares the data buffer, but has its own shape and strides.

But when we index arrays with lists, arrays or boolean arrays (mask), it has to make a copy, an array with its own data buffer. The selection of elements is too complex or irregular to express in terms of the shape and strides attributes.

In some cases the index array is small (compared to the original) and copy is also small. But if we are permuting the whole array, then the index array, and copy will both be as large as the original.

Upvotes: 3

gstukelj
gstukelj

Reputation: 2551

Reading through this, this, and this I think your problem is in using advanced slicing, and to reiterate one of the answers -- numpy docs clearly state that

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

So instead of doing:

inds = list(range(100))
tmp_slice = tmp[inds]

you should rather use:

tmp_slice = tmp[:100]

This will result in a view rather than a copy. You can notice the difference by trying:

tmp[0] = 5

In the first case tmp_slice[0] will return 1.0, but in the second it will return 5.

Upvotes: 1

Related Questions