Reputation: 83
I have multiple 2d arrays like this for example:
A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
I need to find the most frequent value across each respective coordinate so the output would be like this:
E = [[-1 -1 0 1 -1],[1 -1 0 -1 -1],[0 1 -1 1 -1],[-1 1 -1 -1 -1]]
I can definitely loop through each of these arrays but I was looking for a vectorised approach. The elements can be around 10-11 in number and arrays dimensions are around 900X900.
Is it possible to solve this using list comprehension?
Upvotes: 2
Views: 205
Reputation: 9806
You can simply use scipy.stats.mode:
from scipy import stats
arr = [A,B,C,D]
arr.mode(arr)[0][0].tolist()
#output
[[-1, -1, 0, 1, -1],
[1, -1, 0, -1, -1],
[0, 1, -1, 1, -1],
[-1, 1, -1, -1, -1]]
Upvotes: 0
Reputation: 24059
You can zip
all arrays and in each cell of rows and columns of all arrays compute count and find max and save max like below:
import numpy as np
from collections import Counter
def cell_wise_cnt(arrs):
n_row = 0
res = np.empty((len(arrs[0]),len(arrs[0][0])))
for row in zip(*arrs):
arr = np.array(row)
num_col = len(arr[0])
for col in range(num_col):
res[n_row][col] = Counter(arr[:, col]).most_common()[0][0]
n_row += 1
return res
Output:
>>> cell_wise_cnt(arrs = (A,B,C,D))
array([[-1., -1., 0., 1., -1.],
[ 1., -1., 0., -1., -1.],
[ 0., 1., -1., 1., -1.],
[-1., 1., -1., -1., -1.]])
Benchmark on colab:
%timeit cell_wise_cnt(arrs = (A,B,C,D))
# 136 µs per loop
% timeit scipy.stats.mode([A,B,C,D]).mode
# 585 µs per loop
%timeit stats.mode((A,B,C,D)*100_000).mode
# 1.73 s per loop
%timeit cell_wise_cnt(arrs = (A,B,C,D)*100_000)
# 2.38 s per loop
With python 3.9 and Julio_Lopes's Answer we can get a better run_time.:
import statistics
def Julio_Lopes(arrs):
return [[statistics.mode(j) for j in zip(*i)] for i in zip(*arrs)]
%timeit Julio_Lopes(arrs = (A,B,C,D))
# 106 µs per loop
%timeit Julio_Lopes(arrs = (A,B,C,D)*100_000)
# 653 ms per loop
Upvotes: 1
Reputation: 413
Using list comprehension gets a little Hacky. Gave some work, but did it.
Basically you have to use nested child list comprehension, and the arrays must be of the same size for this to work.
To work with a matrix, it would need just 1 nested list, but as we are working with a list of matrixes, it'll be 3 dimensional, so 2 nested childs.
The import mode I used to get the most dominant value.
from statistics import mode
A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
matrixes = [A, B, C, D]
result = [[mode([x[k][j] for x in matrixes]) for j in range(len(matrixes[0][0]))] for k in range(len([x[0][0] for x in matrixes]))]
print(result)
result:
[[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
Upvotes: 1