mad
mad

Reputation: 2789

Elementwise comparison between two arrays without for loops in Numpy

I have one big array called dataset in Numpy of dimensions (700, 28, 28, 3). Suppose this matrix is like the one below:

>>> dataset=np.random.rand(5600,28,28,3)
>>> dataset.shape
(5600, 28, 28, 3)

Now, let's suppose I have another array, simpler, called query which I will use to search in the dataset array

>>> query=np.random.rand(28,28,3)
>>> query.shape
(28, 28, 3)

One way to search that matrix query in the bigger one is by calculating the mean squared error between it and all the elements of the array dataset. The smaller MSE tells me where the matrix is in the array dataset.

The issue is, I don't want to make a for loop in Python to calculate the MSE one by one, store the MSEs in another array, and then getting the position of the smallest MSE when the loop ends. I already have two for loops before this comparison and, therefore, would like to make it as efficient and fast as possible. Is that possible to solve such a problem without a big for loop?

Upvotes: 0

Views: 953

Answers (3)

Dani Mesejo
Dani Mesejo

Reputation: 61900

For this you could use cdist, with the squared euclidean distance:

import numpy as np
from scipy.spatial.distance import cdist

dataset = np.random.rand(5600, 28, 28, 3)
query = np.random.rand(28, 28, 3)

res = cdist(query.reshape((1, -1)), dataset.reshape((5600, -1)), 'seuclidean')
print(np.argmin(res))

Upvotes: 1

AGawish
AGawish

Reputation: 98

You can create a scanner function that scans over the dataset using map then extract the minimum MSE location from the resulting map:

MSE_scanner = lambda A : ((query-A)**2).mean() # create the MSE comparison function
MSE_array = list(map(MSE_scanner, dataset)) # array of MSEs relevant to query
MSE_minimum = min(MSE_array) # extract the minimum MSE which should the one matched
query_location = MSE_array.index(MSE_minimum) # extract the location of the minimum MSE

Upvotes: 1

fountainhead
fountainhead

Reputation: 3722

You could do this:

se = (dataset-query)**2                            # Squared error - shape (L,28,28,3)
sum_of_se = np.sum(se.reshape(-1,28*28*3), axis=1) # Sum of squared error - shape (L,)
print (np.argmin(sum_of_se))                       # Position of minimum within sum_of_se

Upvotes: 2

Related Questions