Reputation: 36249
I have a one-dimensional array, for example:
>>> a
array([ 0., 1., nan, nan, 4., nan, 6., nan, 8., 9.])
I also have an index array which indicates the relevant parts of a
, for example:
>>> index
array([0, 2, 4, 6, 8])
Now I want to modify those parts of a
which are pointed to by index
and which fulfill a specific condition, namely numpy.isnan
(set them to zero).
Because for an index array a copy is returned I cannot simply use
>>> sub = a[index]
>>> sub[numpy.isnan(sub)] = 0
This only modifies the copy sub
but not the original array.
sub
Copying the updated array sub
to a[index]
:
>>> sub[numpy.isnan(sub)] = 0
>>> a[index] = sub
This works however if the sub-array is large and only a few elements have been updated then this involves lots of unnecessary copying.
I can turn the index array into a boolean array via
>>> mask = numpy.zeros(a.size, dtype=bool)
>>> mask[index] = True
and update the original array via
>>> a[mask & numpy.isnan(a)] = 0
Similarly I can create a combined index array via
>>> mask = numpy.intersect1d(index, numpy.where(numpy.isnan(a)), assume_unique=True)
>>> a[mask] = 0
However both ways involve checking the whole array a
for the condition which again involves lots of unnecessary operations because only a small part of that array is interesting.
Is there a more efficient way to modify an array based on an index array and a condition which reduces the amount of unnecessary operations?
In other words: The two workarounds above have both pros and cons. The first approach eliminates unnecessary condition-checks but (potentially) involves unnecessary copying. The second approach eliminates unnecessary copying but (potentially) involves unnecessary condition-checks. So is there a method that combines the advantages of both approaches and thus eliminates both unnecessary copying and unnecessary condition-checks?
Upvotes: 2
Views: 74
Reputation: 280887
One option would be to extract a[index]
to test the predicate, select the matching values of index
, and index again:
a[index[np.isnan(a[index])]] = 0
You may want to test your expectations about what operations will actually be expensive, though. Unnecessary predicate tests or unnecessary copies may not be that expensive, and if you're going to use NumPy, you're going to have to get used to unnecessary copies. NumPy loves its giant scratch arrays.
Upvotes: 2
Reputation: 790
You can just write your own loop: iterating over the relevant indices, checking the array elements at those locations, and updating them if necessary.
for k in index:
if numpy.isnan( a[k] ):
a[k] = 0
Upvotes: 0