Reputation: 591
Let's say we have the following data array:
data_array = np.array([[1, 1, 1], [1, 1, 2], [2, 2, 2], [3, 3, 3], [4, 4, 4]], np.int16)
data_array
array([[1, 1, 1],
[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
And we want to mask the array according to the following ranges to be able to apply a calculation on the masked parts:
intervals = [[1, 2], [2, 3], [3, 4]]
We first create an empty array and mask based on the data array so we can combine the results for each masked array:
init = np.zeros((data_array.shape[0], data_array.shape[1]))
result_array = np.ma.masked_where((init == 0), init)
result_array
masked_array(
data=[[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --]],
mask=[[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True]]
With this we can start a for loop that masks the array according the the interval ranges, performs a calculation on the masked array and combines to results to a single result array:
for inter in intervals:
# Extact the start and en values for interval range
start_inter = inter[0]
end_inter = inter[1]
# Mask the array based on interval range
mask_init = np.ma.masked_where((data_array > end_inter), data_array)
masked_array = np.ma.masked_where((mask_init < start_inter), mask_init)
# Perform a dummy calculation on masked array
outcome = (masked_array + end_inter) * 100
# Combine the outcome arrays
result_array[result_array.mask] = outcome[result_array.mask]
With the following result:
array([[300.0, 300.0, 300.0],
[300.0, 300.0, 400.0],
[400.0, 400.0, 400.0],
[600.0, 600.0, 600.0],
[800.0, 800.0, 800.0]])
The question I have is, how can the same result be achieved without using this for loop? So applying the masking and calculation for the whole data_array in a single operation. Note that the calculation's variables change with each mask. Is it possible to apply a vectorized approach to this problem? I would imagine numpy_indexed could be of some help. Thank you.
Upvotes: 0
Views: 949
Reputation: 59701
If the intervals can be made non-overlapping, then you could use a function like this:
import numpy as np
def func(data_array, intervals):
data_array = np.asarray(data_array)
start, end = np.asarray(intervals).T
data_array_exp = data_array[..., np.newaxis]
mask = (data_array_exp >= start) & (data_array_exp <= end)
return np.sum((data_array_exp + end) * mask * 100, axis=-1)
The result should be the same as with the original code in that case:
import numpy as np
def func_orig(data_array, intervals):
init = np.zeros((data_array.shape[0], data_array.shape[1]))
result_array = np.ma.masked_where((init == 0), init)
for inter in intervals:
start_inter = inter[0]
end_inter = inter[1]
mask_init = np.ma.masked_where((data_array > end_inter), data_array)
masked_array = np.ma.masked_where((mask_init < start_inter), mask_init)
outcome = (masked_array + end_inter) * 100
result_array[result_array.mask] = outcome[result_array.mask]
return result_array.data
data_array = np.array([[1, 1, 1], [1, 1, 2], [2, 2, 2], [3, 3, 3], [4, 4, 4]], np.int16)
intervals = [[1, 1.9], [2, 2.9], [3, 4]]
print(np.allclose(func(data_array, intervals), func_orig(data_array, intervals)))
# True
Upvotes: 1