Wilmar van Ommeren
Wilmar van Ommeren

Reputation: 7699

Measure border overlap between numpy 2d regions

I have a large numpy 2d (10000,10000) with many regions (clustered cells with the same cell value). Wat I want is to merge neighbouring regions which are showing more than 35% border overlap. This overlap should be measured by dividing the size of the common border with the neighbour, by the total border size of the region.

I know how to detect the neighbouring regions (Look here), but I have no idea how to measure the border overlap.

As I am working with large arrays a vectorized solution would be most optimal.


Example

#input
region_arr=np.array([[1,1,3,3],[1,2,2,3],[2,2,4,4],[5,5,4,4]])

enter image description here

Output of the neighbour detection script is a numpy 2-d array with the region in the first and the neighbour in the second column.

#result of neighbour detection
>>> region_neighbour=detect_neighbours(region_arr)
>>> region_neighbour
array([[1, 2],
       [1, 3],
       [2, 1],
       [2, 3],
       [2, 4],
       [2, 5],
       [3, 1],
       [3, 2],
       [3, 4],
       [4, 2],
       [4, 3],
       [4, 5],
       [5, 2],
       [5, 4]])

I would like to add a column to the result of the neighbour detection, which contains the percentual overlap between the region and its neighbour. Percentual overlap between region 1 and 3 = 1/8 = 0.125 = common border size/total border size of region 1.

In this example the desired output would look like this:

#output
>>> percentual_overlap=measure_border_overlap(region_arr,region_neighbour)
>>> percentual_overlap
array([[ 1.       ,  3.       ,  0.125   ],
       [ 1.       ,  2.       ,  0.375   ],
       [ 2.       ,  1.       ,  0.3     ],
       [ 2.       ,  3.       ,  0.3     ],
       [ 2.       ,  4.       ,  0.2     ],
       [ 2.       ,  5.       ,  0.2     ],
       [ 3.       ,  1.       ,  0.125   ],
       [ 3.       ,  2.       ,  0.25    ],
       [ 3.       ,  4.       ,  0.125   ],
       [ 4.       ,  2.       ,  0.375   ],
       [ 4.       ,  3.       ,  0.125   ],
       [ 4.       ,  5.       ,  0.125   ],
       [ 5.       ,  2.       ,  0.333333],
       [ 5.       ,  4.       ,  0.166667]])       

With this output it is relatively easy to merge the regions that overlap more than 35% (regions 1 and 2; regions 4 and 2). After the region merging the new array will look like this:

enter image description here

Edit

You can calculate the perimeter of each region by applying the function of pv..

Upvotes: 1

Views: 503

Answers (1)

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10769

Take a look at this Count cells of adjacent numpy regions for inspiration. Deciding how to merge based on such information is a problem with multiple answers I think; it may not have a unique solution depending on the order in which you proceed...

import numpy_indexed as npi

neighbors = np.concatenate([x[:, :-1].flatten(), x[:, +1:].flatten(), x[+1:, :].flatten(), x[:-1, :].flatten()])
centers   = np.concatenate([x[:, +1:].flatten(), x[:, :-1].flatten(), x[:-1, :].flatten(), x[+1:, :].flatten()])
border = neighbors != centers

(neighbors, centers), counts  = npi.count((neighbors[border], centers[border]))
region_group = group_by(centers)
regions, neighbors_per_region = region_group.sum(counts)
fractions = counts / neighbors_per_region[region_group.inverse]
for result in zip(centers, neighbors, fractions): 
    print(result)

Upvotes: 1

Related Questions