Numpy dynamic array slicing based on min/max values

Question

I have a 3 dimensional array of hape (365, x, y) where 36 corresponds to =daily data. In some cases, all the elements along the time axis axis=0 are np.nan.

The time series for each point along the axis=0 looks something like this:

I need to find the index at which the maximum value (peak data) occurs and then the two minimum values on each side of the peak.

import numpy as np

a = np.random.random(365, 3, 3) * 10
a[:, 0, 0] = np.nan

peak_mask = np.ma.masked_array(a, np.isnan(a))
peak_indexes = np.nanargmax(peak_mask, axis=0)

I can find the minimum before the peak using something like this:

early_minimum_indexes = np.full_like(peak_indexes, fill_value=0)

for i in range(peak_indexes.shape[0]):
    for j in range(peak_indexes.shape[1]):
        if peak_indexes[i, j] == 0:
            early_minimum_indexes[i, j] = 0
        else:
            early_mask = np.ma.masked_array(a, np.isnan(a))
            early_loc = np.nanargmin(early_mask[:peak_indexes[i, j], i, j], axis=0)   
            early_minimum_indexes[i, j] = early_loc

With the resulting peak and trough plotted like this:

This approach is very unreasonable time-wise for large arrays (1m+ elements). Is there a better way to do this using numpy?

Mad Physicist · Accepted Answer

While using masked arrays may not be the most efficient solution in this case, it will allow you to perform masked operations on specific axes while more-or-less preserving shape, which is a great convenience. Keep in mind that in many cases, the masked functions will still end up copying the masked data.

You have mostly the right idea in your current code, but you missed a couple of tricks, like being able to negate and combine masks. Also the fact that allocating masks as boolean up front is more efficient, and little nitpicks like np.full(..., 0) -> np.zeros(..., dtype=bool).

Let's work through this backwards. Let's say you had a well-behaved 1-D array with a peak, say a1. You can use masking to easily find the maxima and minima (or indices) like this:

peak_index = np.nanargmax(a1)
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.nanargmin(np.ma.array(a1, mask=~mask))
trough_minus = np.nanargmin(np.ma.array(a1, mask=mask))

This respects the fact that masked arrays flip the sense of the mask relative to normal numpy boolean indexing. It's also OK that the maximum value appears in the calculation of trough_plus, since it's guaranteed not to be a minimum (unless you have the all-nan situation).

Now if a1 was a masked array already (but still 1D), you could do the same thing, but combine the masks temporarily. For example:

a1 = np.ma.array(a1, mask=np.isnan(a1))
peak_index = a1.argmax()
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.ma.masked_array(a1, mask=a.mask | ~mask).argmin()
trough_minus  (np.ma.masked_array(a1, mask=a.mask | mask).argmin()

Again, since masked arrays have reversed masks, it's important to combine the masks using | instead of &, as you would for normal numpy boolean masks. In this case, there is no need for calling the nan version of argmax and argmin, since all the nans are already masked out.

Hopefully, the generalization to multiple dimensions becomes clear from here, given the prevalence of the axis keyword in numpy functions:

a = np.ma.array(a, mask=np.isnan(a))
peak_indices = a.argmax(axis=0).reshape(1, *a.shape[1:])
mask = np.arange(a.shape[0]).reshape(-1, *(1,) * (a.ndim - 1)) >= peak_indices

trough_plus = np.ma.masked_array(a, mask=~mask | a.mask).argmin(axis=0)
trough_minus = np.ma.masked_array(a, mask=mask | a.mask).argmin(axis=0)

N-dimensional masking technique comes from Fill mask efficiently based on start indices, which was asked just for this purpose.

Numpy dynamic array slicing based on min/max values

Answers (2)

Related Questions