vlro
vlro

Reputation: 43

Numpy dynamic array slicing based on min/max values

I have a 3 dimensional array of hape (365, x, y) where 36 corresponds to =daily data. In some cases, all the elements along the time axis axis=0 are np.nan.

The time series for each point along the axis=0 looks something like this:

Image

I need to find the index at which the maximum value (peak data) occurs and then the two minimum values on each side of the peak.

import numpy as np

a = np.random.random(365, 3, 3) * 10
a[:, 0, 0] = np.nan

peak_mask = np.ma.masked_array(a, np.isnan(a))
peak_indexes = np.nanargmax(peak_mask, axis=0)

I can find the minimum before the peak using something like this:

early_minimum_indexes = np.full_like(peak_indexes, fill_value=0)

for i in range(peak_indexes.shape[0]):
    for j in range(peak_indexes.shape[1]):
        if peak_indexes[i, j] == 0:
            early_minimum_indexes[i, j] = 0
        else:
            early_mask = np.ma.masked_array(a, np.isnan(a))
            early_loc = np.nanargmin(early_mask[:peak_indexes[i, j], i, j], axis=0)   
            early_minimum_indexes[i, j] = early_loc

With the resulting peak and trough plotted like this:

Image

This approach is very unreasonable time-wise for large arrays (1m+ elements). Is there a better way to do this using numpy?

Upvotes: 3

Views: 1232

Answers (2)

Mad Physicist
Mad Physicist

Reputation: 114420

While using masked arrays may not be the most efficient solution in this case, it will allow you to perform masked operations on specific axes while more-or-less preserving shape, which is a great convenience. Keep in mind that in many cases, the masked functions will still end up copying the masked data.

You have mostly the right idea in your current code, but you missed a couple of tricks, like being able to negate and combine masks. Also the fact that allocating masks as boolean up front is more efficient, and little nitpicks like np.full(..., 0) -> np.zeros(..., dtype=bool).

Let's work through this backwards. Let's say you had a well-behaved 1-D array with a peak, say a1. You can use masking to easily find the maxima and minima (or indices) like this:

peak_index = np.nanargmax(a1)
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.nanargmin(np.ma.array(a1, mask=~mask))
trough_minus = np.nanargmin(np.ma.array(a1, mask=mask))

This respects the fact that masked arrays flip the sense of the mask relative to normal numpy boolean indexing. It's also OK that the maximum value appears in the calculation of trough_plus, since it's guaranteed not to be a minimum (unless you have the all-nan situation).

Now if a1 was a masked array already (but still 1D), you could do the same thing, but combine the masks temporarily. For example:

a1 = np.ma.array(a1, mask=np.isnan(a1))
peak_index = a1.argmax()
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.ma.masked_array(a1, mask=a.mask | ~mask).argmin()
trough_minus  (np.ma.masked_array(a1, mask=a.mask | mask).argmin()

Again, since masked arrays have reversed masks, it's important to combine the masks using | instead of &, as you would for normal numpy boolean masks. In this case, there is no need for calling the nan version of argmax and argmin, since all the nans are already masked out.

Hopefully, the generalization to multiple dimensions becomes clear from here, given the prevalence of the axis keyword in numpy functions:

a = np.ma.array(a, mask=np.isnan(a))
peak_indices = a.argmax(axis=0).reshape(1, *a.shape[1:])
mask = np.arange(a.shape[0]).reshape(-1, *(1,) * (a.ndim - 1)) >= peak_indices

trough_plus = np.ma.masked_array(a, mask=~mask | a.mask).argmin(axis=0)
trough_minus = np.ma.masked_array(a, mask=mask | a.mask).argmin(axis=0)

N-dimensional masking technique comes from Fill mask efficiently based on start indices, which was asked just for this purpose.

Upvotes: 2

Paul Panzer
Paul Panzer

Reputation: 53069

Here is a method that

  1. copies the data
  2. saves all nan positions and replaces all nans with global min-1
  3. finds the rowwise argmax
  4. subtracts its value from the entire row
    • note that each row now has only non-positive values with the max value now being zero
  5. zeros all nan positions
  6. flips the sign of all values right of the max
    • this is the main idea; it creates a new row-global max at the position where before there was the right hand min; at the same time it ensures that the left hand min is now row-global
  7. retrieves the rowwise argmin and argmax, these are the postitions of the left and right mins in the original array
  8. finds all-nan rows and overwrites the max and min indices at these positions with INVALINT

Code:

INVALINT = -9999
t,x,y = a.shape
t,x,y = np.ogrid[:t,:x,:y]
inval = np.isnan(a)
b = np.where(inval,np.nanmin(a)-1,a)
pk = b.argmax(axis=0)
pkval = b[pk,x,y]
b -= pkval
b[inval] = 0
b[t>pk[None]] *= -1
ltr = b.argmin(axis=0)
rtr = b.argmax(axis=0)
del b
inval = inval.all(axis=0)
pk[inval] = INVALINT
ltr[inval] = INVALINT
rtr[inval] = INVALINT

# result is now in ltr ("left trough"), pk ("peak") and rtr

Upvotes: 1

Related Questions