a_guest
a_guest

Reputation: 36249

Modify array based on index array and condition

Situation

I have a one-dimensional array, for example:

>>> a
array([  0.,   1.,  nan,  nan,   4.,  nan,   6.,  nan,   8.,   9.])

I also have an index array which indicates the relevant parts of a, for example:

>>> index
array([0, 2, 4, 6, 8])

Now I want to modify those parts of a which are pointed to by index and which fulfill a specific condition, namely numpy.isnan (set them to zero).

Because for an index array a copy is returned I cannot simply use

>>> sub = a[index]
>>> sub[numpy.isnan(sub)] = 0

This only modifies the copy sub but not the original array.

Workarounds

Back-copy sub

Copying the updated array sub to a[index]:

>>> sub[numpy.isnan(sub)] = 0
>>> a[index] = sub

This works however if the sub-array is large and only a few elements have been updated then this involves lots of unnecessary copying.

Create a combined mask

I can turn the index array into a boolean array via

>>> mask = numpy.zeros(a.size, dtype=bool)
>>> mask[index] = True

and update the original array via

>>> a[mask & numpy.isnan(a)] = 0

Similarly I can create a combined index array via

>>> mask = numpy.intersect1d(index, numpy.where(numpy.isnan(a)), assume_unique=True)
>>> a[mask] = 0

However both ways involve checking the whole array a for the condition which again involves lots of unnecessary operations because only a small part of that array is interesting.

Question

Is there a more efficient way to modify an array based on an index array and a condition which reduces the amount of unnecessary operations?

In other words: The two workarounds above have both pros and cons. The first approach eliminates unnecessary condition-checks but (potentially) involves unnecessary copying. The second approach eliminates unnecessary copying but (potentially) involves unnecessary condition-checks. So is there a method that combines the advantages of both approaches and thus eliminates both unnecessary copying and unnecessary condition-checks?

Upvotes: 2

Views: 74

Answers (2)

user2357112
user2357112

Reputation: 280887

One option would be to extract a[index] to test the predicate, select the matching values of index, and index again:

a[index[np.isnan(a[index])]] = 0

You may want to test your expectations about what operations will actually be expensive, though. Unnecessary predicate tests or unnecessary copies may not be that expensive, and if you're going to use NumPy, you're going to have to get used to unnecessary copies. NumPy loves its giant scratch arrays.

Upvotes: 2

Alex
Alex

Reputation: 790

You can just write your own loop: iterating over the relevant indices, checking the array elements at those locations, and updating them if necessary.

for k in index:
    if numpy.isnan( a[k] ):
        a[k] = 0

Upvotes: 0

Related Questions