Reputation: 2789
I have one big array called dataset in Numpy of dimensions (700, 28, 28, 3). Suppose this matrix is like the one below:
>>> dataset=np.random.rand(5600,28,28,3)
>>> dataset.shape
(5600, 28, 28, 3)
Now, let's suppose I have another array, simpler, called query which I will use to search in the dataset array
>>> query=np.random.rand(28,28,3)
>>> query.shape
(28, 28, 3)
One way to search that matrix query in the bigger one is by calculating the mean squared error between it and all the elements of the array dataset. The smaller MSE tells me where the matrix is in the array dataset.
The issue is, I don't want to make a for loop in Python to calculate the MSE one by one, store the MSEs in another array, and then getting the position of the smallest MSE when the loop ends. I already have two for loops before this comparison and, therefore, would like to make it as efficient and fast as possible. Is that possible to solve such a problem without a big for loop?
Upvotes: 0
Views: 953
Reputation: 61900
For this you could use cdist, with the squared euclidean distance:
import numpy as np
from scipy.spatial.distance import cdist
dataset = np.random.rand(5600, 28, 28, 3)
query = np.random.rand(28, 28, 3)
res = cdist(query.reshape((1, -1)), dataset.reshape((5600, -1)), 'seuclidean')
print(np.argmin(res))
Upvotes: 1
Reputation: 98
You can create a scanner function that scans over the dataset using map
then extract the minimum MSE location from the resulting map:
MSE_scanner = lambda A : ((query-A)**2).mean() # create the MSE comparison function
MSE_array = list(map(MSE_scanner, dataset)) # array of MSEs relevant to query
MSE_minimum = min(MSE_array) # extract the minimum MSE which should the one matched
query_location = MSE_array.index(MSE_minimum) # extract the location of the minimum MSE
Upvotes: 1
Reputation: 3722
You could do this:
se = (dataset-query)**2 # Squared error - shape (L,28,28,3)
sum_of_se = np.sum(se.reshape(-1,28*28*3), axis=1) # Sum of squared error - shape (L,)
print (np.argmin(sum_of_se)) # Position of minimum within sum_of_se
Upvotes: 2