Rahul Murmuria
Rahul Murmuria

Reputation: 437

Numpy: signed values of element-wise absolute maximum of a 2D array

Let us assume that I have a 2D array named arr of shape (4, 3) as follows:

>>> arr
array([[ nan,   1., -18.],
   [ -1.,  -1.,  -1.],
   [  1.,   1.,   5.],
   [  1.,  -1.,   0.]])

Say that, I would like to assign the signed value of the element-wise absolute maximum of (1.0, 1.0, -15.0) and the rows arr[[0, 2], :] back to arr. Which means, I am looking for the output:

>>> arr
array([[ 1.,   1.,  -18.],
   [ -1.,  -1.,  -1.],
   [  1.,   1., -15.],
   [  1.,  -1.,   0.]])

The closest thing I found in the API reference for this is numpy.fmax but it doesn't do the absolute value. If I used:

arr[index_list, :] = np.fmax(arr[index_list, :], new_tuple)

my array would finally look like:

>>> arr
array([[ 1.,   1., -15.],
   [ -1.,  -1.,  -1.],
   [  1.,   1.,   5.],
   [  1.,  -1.,   0.]])

Now, the API says that this function is

equivalent to np.where(x1 >= x2, x1, x2) when neither x1 nor x2 are NaNs, but it is faster and does proper broadcasting

I tried using the following:

arr[index_list, :] = np.where(np.absolute(arr[index_list, :]) >= np.absolute(new_tuple), 
                              arr[index_list, :], new_tuple)

Although this produced the desired output, I got the warning:

/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevconsole.py:1: RuntimeWarning: invalid value encountered in greater_equal

I believe this warning is because of the NaN which is not handled gracefully here, unlike the np.fmax function. In addition, the API docs mention that np.fmax is faster and does broadcasting correctly (not sure what part of broadcasting is missing in the np.where version)

In conclusion, what I am looking for is something similar to:

arr[index_list, :] = np.fmax(arr[index_list, :], new_tuple, key=abs)

There is no such key attribute available to this function, unfortunately.

Just for context, I am interested in the fastest possible solution because my actual shape of the arr array is an average of (100000, 50) and I am looping through almost 1000 new_tuple tuples (with each tuple equal in shape to the number of columns in arr, of course). The index_list changes for each new_tuple.


Edit 1:

One possible solution is, to begin with replacing all NaN in arr with 0. i.e. arr[np.isnan(arr)] = 0. After this, I can use the np.where with np.absolute trick mentioned in my original text. However, this is probably a lot slower than np.fmax, as suggested by the API.


Edit 2:

The index_list may have repeated indexes in subsequent loops. Every new_tuple comes with a corresponding rule and the index_list is selected based on that rule. There is nothing stopping different rules from having overlapping indexes that they match to. @Divakar has an excellent answer for the case where index_list has no repeats. Other solutions are however welcome covering both cases.

Upvotes: 1

Views: 976

Answers (1)

Divakar
Divakar

Reputation: 221614

Assuming that list of all index_list has no repeated indexes:

Approach #1

I would propose more of a vectorized solution once we have all of index_lists and new_tuples stored in one place, preferably as a list. As such this could be the preferred one, if we are dealing with lots of such tuples and lists.

So, let's say we have them stored as the following :

new_tuples = [(1.0, 1.0, -15.0), (6.0, 3.0, -4.0)] # list of all new_tuple
index_lists =[[0,2],[4,1,6]]  # list of all index_list

The solution thereafter would be to manually repeat, replacing the broadcasting and then use np.where as shown later on in the question. Using np.where on the concern around the said warning, we can ignore, if the new_tuples have non-NaN values. Thus, the solution would be -

idx = np.concatenate(index_lists)
lens = list(map(len,index_lists))

a = arr[idx]
b = np.repeat(new_tuples,lens,axis=0)
arr[idx] = np.where(np.abs(a) > np.abs(b), a, b)

Approach #2

Another approach would be to store the absolute values of arr beforeand : abs_arr = np.abs(arr) and using those within np.where. This should save a lot time within the loop. Thus, the relevant computation would reduce to :

arr[index_list, :] = np.where(abs_arr[index_list, :] > np.abs(b), a, new_tuple)

Upvotes: 2

Related Questions