Maxim L
Maxim L

Reputation: 209

Average of a 3D numpy slice based on 2D arrays

I am trying to calculate the average of a 3D array between two indices on the 1st axis. The start and end indices vary from cell to cell and are represented by two separate 2D arrays that are the same shape as a slice of the 3D array.

I have managed to implement a piece of code that loops through the pixels of my 3D array, but this method is painfully slow in the case of my array with a shape of (70, 550, 350). Is there a way to vectorise the operation using numpy or xarray (the arrays are stored in an xarray dataset)?

Here is a snippet of what I would like to optimise:

# My 3D raster containing values; shape = (time, x, y)
values = np.random.rand(10, 55, 60)

# A 2D raster containing start indices for the averaging
start_index = np.random.randint(0, 4, size=(values.shape[1], values.shape[2]))

# A 2D raster containing end indices for the averaging
end_index = np.random.randint(5, 9, size=(values.shape[1], values.shape[2]))

# Initialise an array that will contain results
mean_array = np.zeros_like(values[0, :, :])

# Loop over 3D raster to calculate the average between indices on axis 0
for i in range(0, values.shape[1]):
    for j in range(0, values.shape[2]):
        mean_array[i, j] = np.mean(values[start_index[i, j]: end_index[i, j], i, j], axis=0)

Upvotes: 1

Views: 220

Answers (1)

jakevdp
jakevdp

Reputation: 86328

One way to do this without loops is to zero-out the entries you don't want to use, compute the sum of the remaining items, then divide by the number of nonzero entries. For example:

i = np.arange(values.shape[0])[:, None, None]
mean_array_2 = np.where((i >= start_index) & (i < end_index), values, 0).sum(0) / (end_index - start_index)
np.allclose(mean_array, mean_array_2)
# True

Note that this assumes that the indices are in the range 0 <= i < values.shape[0]; if this is not the case you can use np.clip or other means to standardize the indices before computation.

Upvotes: 2

Related Questions