Reputation: 41
Assuming there is a 2D-array and it is divided into several sub-regions as regions. there is another array filled with values. I would like to aggregate the values by sub-regions. The following code is my solution.
But, when the number of sub-regions is very large, the iteration costs much time. I would like to ask if there is any way to accelerate the programm? I suppose maybe numpy could do this, but I don't know how to do.
import numpy as np
regions = np.array([[0,0,0,1],
[0,0,1,1],
[1,1,1,2],
[2,2,2,2]], dtype=np.int32)
value_array = np.array([[9,5,8,4],
[6,4,8,5],
[4,5,9,7],
[4,7,3,0]], dtype=np.float32)
aggre_array = np.zeros_like(value_array)
for r in range(regions.max()+1):
region = regions==r
aggre_array[region] = value_array[region].mean()
print(aggre_array)
'''output
[[6.4 6.4 6.4 5.8333335]
[6.4 6.4 5.8333335 5.8333335]
[5.8333335 5.8333335 5.8333335 4.2 ]
[4.2 4.2 4.2 4.2 ]]
'''
Upvotes: 4
Views: 401
Reputation: 5949
In such kind of grouping you need to work with flattened variants of arrays sorted by indices that sorts first array, like:
regions_sort = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
value_array_sort = [9, 5, 8, 6, 4, 4, 8, 5, 4, 5, 9, 7, 4, 7, 3, 0]
The latter part is to find indices that separates individual groups and apply it for further calcutions of groups sums, counts and means:
marker_idx = [5, 11]
group_counts = [5, 6, 5]
group_sums = [32, 35, 21]
group_means = [6.4, 5.83333, 4.2]
Finally, repeat values so that it fits value_array, rearrange them in inverse order and reshape it to initial shape.
sorter = np.argsort(regions.ravel())
_, inverse_sorter = np.unique(sorter, return_index=True) #could be optimised...
regions_sort = regions.ravel()[sorter]
value_array_sort = value_array.ravel()[sorter]
marker_idx = np.flatnonzero(np.diff(regions_sort))+1
reduceat_idx = np.r_[0, marker_idx]
group_counts = np.diff(marker_idx, prepend=0, append=regions.size) #could also use np.bincount...
group_sums = np.add.reduceat(regions_sort, reduceat_idx)
group_means = group_sums / group_counts
new_values = np.repeat(group_means, group_counts)
new_value_array = new_values[inverse_sorter].reshape(value_array.shape)
>>> new_value_array
array([[6.4 , 6.4 , 6.4 , 5.83333333],
[6.4 , 6.4 , 5.83333333, 5.83333333],
[5.83333333, 5.83333333, 5.83333333, 4.2 ],
[4.2 , 4.2 , 4.2 , 4.2 ]])
I've also found a way to do it numpy_indexed
package designed for solving grouping problems in efficient ways:
import numpy_indexed as npi
groupby = npi.group_by(regions.ravel())
keys, values = groupby.mean(value_array.ravel())
>>> values[groupby.inverse].reshape(regions.shape)
array([[6.4 , 6.4 , 6.4 , 5.83333333],
[6.4 , 6.4 , 5.83333333, 5.83333333],
[5.83333333, 5.83333333, 5.83333333, 4.2 ],
[4.2 , 4.2 , 4.2 , 4.2 ]])
Upvotes: 1