Euclidean Distance between Scipy Sparse Matrix and Sparse Vector

Question

I am working on a KNN algorithm for a university assignment and at the moment I'm working on finding the Euclidean distance between each of the training vectors stored as a Scipy lil_matrix (due to the sparseness of the values in the vectors), and a testing vector stored as a 1 x n lil_matrix for the same reasons above.

To work out the euclidean distance I'm then doing the following code:

for positiveIndex, positivesComparison in enumerate(positives):
    result.append((spatial.distance.euclidean(positivesComparison.todense(),sentenceVector.todense() ), positiveIndex, 1))

Where sentenceVector is a lil_matrix of 1 row, and positives is a lil_matrix of size n x m.

I want to try and work out something faster than going through the positives matrix row by row and evaluating the euclidean distance every time, and maybe run the euclidean distance between the positives matrix and the sentenceVector vector, and return a 1 x m matrix with the euclidean distances. The reason I want to do this is that the current system is relatively slow to compute as it is basically an NM time complexity as I need to compute more than one sentence test. Is this possible, and if yes, how would I do it?

Note, the task is to evaluate performance using different K values for the KNN algorithm and not on the actual implementation of the KNN (although we are not allowed to use KNN libraries to do the task)

Euclidean Distance between Scipy Sparse Matrix and Sparse Vector

Answers (1)

Related Questions