Reputation: 105
I have 5 numpy arrays:
array_1 = [1,2,3]
array_2 = [4,5,6]
array_3 = [7,8,9]
array_4 = [10,11,12]
array_5 = [1,2,3]
I need to compare them all - essentially, if ANY of the 5 arrays above have the same values (and index), I need to know about it. Currently, I have something like this done:
index_array_1 = np.where(array_1 == array_2)[0]
index_array_2 = np.where(array_1 == array_3)[0]
index_array_3 = np.where(array_1 == array_4)[0]
index_array_4 = np.where(array_1 == array_5)[0]
index_array_5 = np.where(array_2 == array_3)[0]
index_array_6 = np.where(array_2 == array_4)[0]
index_array_7 = np.where(array_2 == array_5)[0]
index_array_8 = np.where(array_3 == array_4)[0]
index_array_9 = np.where(array_3 == array_5)[0]
index_array_10 = np.where(array_4 == array_5)[0]
So, in this case, only index_array_4 would return any values, because array_1 and array_5 match up. But, this clearly isn't the best way to do this. It's a lot of code, and it takes a while to run as well.
Is there something I haven't come across yet where I can essentially say "if ANY of the 5 arrays match, tell me, and also let me know which two arrays are the ones that match"?
I'd also like it to return an index array of one of the matching arrays, as well.
Upvotes: 1
Views: 1453
Reputation: 15872
You can try a one-liner:
>>> from itertools import combinations
>>> [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2)
if np.all(np.equal(*map(globals().get,arrays)))]
Output:
[('array_1', 'array_5')]
EXPLANATION:
>>> [f"array_{i}" for i in range(1,6)]
['array_1', 'array_2', 'array_3', 'array_4', 'array_5']
>>> list(combinations([f"array_{i}" for i in range(1,6)],2))
[('array_1', 'array_2'),
('array_1', 'array_3'),
('array_1', 'array_4'),
('array_1', 'array_5'),
('array_2', 'array_3'),
('array_2', 'array_4'),
('array_2', 'array_5'),
('array_3', 'array_4'),
('array_3', 'array_5'),
('array_4', 'array_5')]
Now it iterates through the combinations,
If we take the first element, i.e. the first iteration, rest of the steps will look like:
>>> [*map(globals().get, ('array_1', 'array_2'))]
[[1, 2, 3], [4, 5, 6]]
>>> np.all(np.equal([1, 2, 3], [4, 5, 6]))
False
EDIT:
If inside a function then try:
def bar():
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
scope = locals()
return [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2)
if np.all(eval(arrays[0],scope) == eval(arrays[1],scope))]
Upvotes: 2
Reputation:
You can use the .count()
method to validate if in the array are more than one ocurrence of an array:
def compare(*arrays):
temp = [list(x) for x in list(arrays)]
for i in range(len(temp)):
if temp.count(temp[i]) > 1:
return (i,temp[i + 1:].index(temp[i]) + 1)
else:
return False
The fisrst line of the function generates a list of all the array used like arguments casted to list type. If in the list there are more than one i
(actual iteration value), will return i
and the index of the another identic array. The function needs to return this index of the another identic array with the method .index()
in a range of a list without the actual i
.
print(compare(array_1,array_2,array_3,array_4,array_5))
will return
(0, 4)
Upvotes: 0
Reputation: 59711
You can do that like this:
import numpy as np
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compare all vs all
c = np.all(all_arrays[:, np.newaxis] == all_arrays, axis=-1)
# Take only half the result to avoid self results and symmetric results
c = np.triu(c, 1)
# Get matching pairs
m = np.stack(np.where(c), axis=1)
# One row per matching pair
print(m)
# [[0 4]]
This makes more comparisons than necessary, though (e.g. array_1
vs array_2
and array_2
vs array_1
). You can also use something like scipy.spatial.distance.pdist
to potentially save some time:
import numpy as np
import scipy.spatial.distance
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compute pairwise distances
d = scipy.spatial.distance.pdist(all_arrays, 'hamming')
d = scipy.spatial.distance.squareform(d)
# Get indices of pairs where it is zero
c = np.triu(d == 0, 1)
m = np.stack(np.where(c), axis=1)
print(m)
# [[0 4]]
Upvotes: 0