Reputation: 6899
I am curious as to why the following cdist
differ so much in time even though they produce the same results:
import numpy as np
from scipy.spatial.distance import cdist
x = np.random.rand(10_000_000, 50)
y = np.random.rand(50)
result_1 = cdist(x, y[np.newaxis, :])
result_2 = cdist(x, y[np.newaxis, :], `minkowski`, p=2.)
The result_1
is significantly faster than result_2
.
Upvotes: 3
Views: 608
Reputation: 1132
The C implementation of the Euclidean distance, source lines 50-66, uses multiplication and a sqrt()
call while the Minkowski distance, source lines 381-391 is based on the much slower calls to the pow()
function.
For reference, see discussion here and here comparing pow
to multiplication and sqrt
.
So despite the appearance that the Euclidean norm just calls the Minkowski norm, source line 614, cdist
actually calls directly through to the C implementation where the code is different. The python euclidean
function is not called in the actual execution.
Upvotes: 3