Reputation: 7699
I have a large numpy 2d (10000,10000) with many regions (clustered cells with the same cell value). Wat I want is to merge neighbouring regions which are showing more than 35% border overlap. This overlap should be measured by dividing the size of the common border with the neighbour, by the total border size of the region.
I know how to detect the neighbouring regions (Look here), but I have no idea how to measure the border overlap.
As I am working with large arrays a vectorized solution would be most optimal.
#input
region_arr=np.array([[1,1,3,3],[1,2,2,3],[2,2,4,4],[5,5,4,4]])
Output of the neighbour detection script is a numpy 2-d array with the region in the first and the neighbour in the second column.
#result of neighbour detection
>>> region_neighbour=detect_neighbours(region_arr)
>>> region_neighbour
array([[1, 2],
[1, 3],
[2, 1],
[2, 3],
[2, 4],
[2, 5],
[3, 1],
[3, 2],
[3, 4],
[4, 2],
[4, 3],
[4, 5],
[5, 2],
[5, 4]])
I would like to add a column to the result of the neighbour detection, which contains the percentual overlap between the region and its neighbour. Percentual overlap between region 1 and 3 = 1/8 = 0.125 = common border size/total border size of region 1.
In this example the desired output would look like this:
#output
>>> percentual_overlap=measure_border_overlap(region_arr,region_neighbour)
>>> percentual_overlap
array([[ 1. , 3. , 0.125 ],
[ 1. , 2. , 0.375 ],
[ 2. , 1. , 0.3 ],
[ 2. , 3. , 0.3 ],
[ 2. , 4. , 0.2 ],
[ 2. , 5. , 0.2 ],
[ 3. , 1. , 0.125 ],
[ 3. , 2. , 0.25 ],
[ 3. , 4. , 0.125 ],
[ 4. , 2. , 0.375 ],
[ 4. , 3. , 0.125 ],
[ 4. , 5. , 0.125 ],
[ 5. , 2. , 0.333333],
[ 5. , 4. , 0.166667]])
With this output it is relatively easy to merge the regions that overlap more than 35% (regions 1 and 2; regions 4 and 2). After the region merging the new array will look like this:
You can calculate the perimeter of each region by applying the function of pv..
Upvotes: 1
Views: 503
Reputation: 10769
Take a look at this Count cells of adjacent numpy regions for inspiration. Deciding how to merge based on such information is a problem with multiple answers I think; it may not have a unique solution depending on the order in which you proceed...
import numpy_indexed as npi
neighbors = np.concatenate([x[:, :-1].flatten(), x[:, +1:].flatten(), x[+1:, :].flatten(), x[:-1, :].flatten()])
centers = np.concatenate([x[:, +1:].flatten(), x[:, :-1].flatten(), x[:-1, :].flatten(), x[+1:, :].flatten()])
border = neighbors != centers
(neighbors, centers), counts = npi.count((neighbors[border], centers[border]))
region_group = group_by(centers)
regions, neighbors_per_region = region_group.sum(counts)
fractions = counts / neighbors_per_region[region_group.inverse]
for result in zip(centers, neighbors, fractions):
print(result)
Upvotes: 1