Reputation: 222471
I need to calculate a most frequent element in a matrix based on the neighbor values and itself. I found a generic_filter function which I used to calculate what I wanted. So here is how I can do this for a 2d array
arr = np.array([
[1, 2, 4],
[5, 6, 7],
[2, 4, 4]
])
def most_frequent(arr):
def most_frequent(val):
return Counter(val).most_common(1)[0][0]
footprint = [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
return ndimage.generic_filter(arr, most_frequent, footprint=footprint, mode='constant')
print most_frequent(arr)
This returns me
[[0 0 0]
[0 4 0]
[0 0 0]]
Ignore the elements on the edge. As you see the middle element is 4 because this is a most frequent element among the neighbors and the value.
The big problem is that I need to do the same thing for a 3d matrix. So for a matrix like this
arr = np.array([
[[1, 1], [2, 2], [4, 4]],
[[5, 5], [6, 6], [7, 7]],
[[2, 2], [4, 4], [4, 4]]
])
I expect to get [0, 0]
everywhere and [4, 4]
in the middle. This fails with RuntimeError('filter footprint array has incorrect shape.')
. The worse thing that I have doubts that I can use generic_filter here because the docs say:
cval : scalar, optional Value to fill past edges of input if mode is ‘constant’.
So how can I solve my problem?
Upvotes: 0
Views: 418
Reputation: 18628
Here a fully vectored solution.
First make flattened neighborhoods:
(n,m,_)=M.shape
(sn,sm,s2)=M.strides
newshape=(n-2,m-2,9,2)
newstrides=(sn,sm,2*s2,s2)
neighborhoods=np.lib.stride_tricks.as_strided(M,newshape,newstrides)
"""
array([[[[1, 1],
[2, 2],
[4, 1],
[1, 1],
[5, 5],
[6, 6],
[7, 7],
[2, 3],
[2, 2]],
[[2, 2],
[4, 1],
[1, 1],
[5, 5],
[6, 6],
[7, 7],
[2, 3],
[2, 2],
[4, 1]]]])
"""
Then you have to pack the two components to use np.unique
which work with 1D arrays.
assuming M.dtype
is int32, you can do that by view:
packed_neighborhoods=np.ascontiguousarray(neighborhoods).view(int64)
In [5]: packed_neighborhoods.shape
Out[5]: (1, 2, 9, 1)
Now we define a function that take a 1D array and find the indice of the most frequent , based on np.unique
:
def mostfreq(arr):
_,index,counts=unique(arr, return_index=True, return_counts=True)
return index[counts.argmax()]
Apply it on the good axis:
ind2=apply_along_axis(mostfreq,2,packed_neighborhoods).squeeze()
And there is the result, including other indices.
ind0,ind1=indices(neighborhoods.shape[:2])
print(neighborhoods[ind0,ind1,ind2])
"""
[[[1 1]
[4 1]]]
"""
But your solution has same performance at the moment ;(.
Upvotes: 1
Reputation: 222471
One way I have found how to achieve this is by doing something like
def most_frequent(M):
x, y, _ = arr.shape
res = np.zeros((x - 2, y - 2, 2))
for i in xrange(1, x - 1):
for j in xrange(1, y - 1):
neighbors = [M[i - 1, j - 1], M[i - 1, j], M[i - 1, j + 1], M[i, j - 1], M[i, j], M[i, j + 1], M[i + 1, j - 1], M[i + 1, j], M[i + 1, j + 1]]
res[i - 1, j - 1] = Counter([tuple(_) for _ in neighbors]).most_common(1)[0][0]
return res
Still looking for a better solution (the one that does not involve my 2 loops).
Upvotes: 0