Abi
Abi

Reputation: 83

Find most frequent value per coordinate in multiple 2d Arrays

I have multiple 2d arrays like this for example:

A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

I need to find the most frequent value across each respective coordinate so the output would be like this:

E = [[-1 -1 0 1 -1],[1 -1 0 -1 -1],[0 1 -1 1 -1],[-1 1 -1 -1 -1]]

I can definitely loop through each of these arrays but I was looking for a vectorised approach. The elements can be around 10-11 in number and arrays dimensions are around 900X900.

Is it possible to solve this using list comprehension?

Upvotes: 2

Views: 205

Answers (3)

Hamzah Al-Qadasi
Hamzah Al-Qadasi

Reputation: 9806

You can simply use scipy.stats.mode:

from scipy import stats

arr = [A,B,C,D]
arr.mode(arr)[0][0].tolist()

#output
 [[-1, -1, 0, 1, -1],
 [1, -1, 0, -1, -1],
 [0, 1, -1, 1, -1],
 [-1, 1, -1, -1, -1]]

Upvotes: 0

I'mahdi
I'mahdi

Reputation: 24059

You can zip all arrays and in each cell of rows and columns of all arrays compute count and find max and save max like below:

import numpy as np
from collections import Counter
def cell_wise_cnt(arrs):
    n_row = 0
    res = np.empty((len(arrs[0]),len(arrs[0][0])))
    for row in zip(*arrs):
        arr = np.array(row)
        num_col = len(arr[0])
        for col in range(num_col):
            res[n_row][col] = Counter(arr[:, col]).most_common()[0][0]
        n_row += 1
    return res

Output:

>>> cell_wise_cnt(arrs = (A,B,C,D))

array([[-1., -1.,  0.,  1., -1.],
       [ 1., -1.,  0., -1., -1.],
       [ 0.,  1., -1.,  1., -1.],
       [-1.,  1., -1., -1., -1.]])

Benchmark on colab:

%timeit cell_wise_cnt(arrs = (A,B,C,D))
# 136 µs per loop

% timeit scipy.stats.mode([A,B,C,D]).mode
# 585 µs per loop

%timeit stats.mode((A,B,C,D)*100_000).mode
# 1.73 s per loop

%timeit cell_wise_cnt(arrs = (A,B,C,D)*100_000)
# 2.38 s per loop

With python 3.9 and Julio_Lopes's Answer we can get a better run_time.:

import statistics
def Julio_Lopes(arrs):
    return [[statistics.mode(j)  for j in zip(*i)] for i in zip(*arrs)]

%timeit Julio_Lopes(arrs = (A,B,C,D))
# 106 µs per loop

%timeit Julio_Lopes(arrs = (A,B,C,D)*100_000)
# 653 ms per loop

Upvotes: 1

Cesar Lopes
Cesar Lopes

Reputation: 413

Using list comprehension gets a little Hacky. Gave some work, but did it.

Basically you have to use nested child list comprehension, and the arrays must be of the same size for this to work.

To work with a matrix, it would need just 1 nested list, but as we are working with a list of matrixes, it'll be 3 dimensional, so 2 nested childs.

The import mode I used to get the most dominant value.

from statistics import mode


A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

matrixes = [A, B, C, D]

result = [[mode([x[k][j] for x in matrixes]) for j in range(len(matrixes[0][0]))] for k in range(len([x[0][0] for x in matrixes]))]


print(result)

result:

[[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

Upvotes: 1

Related Questions