Reputation: 581
Here is my code
import numpy as np
import time
from scipy.spatial import distance
y1=np.array([0,0,0,0,1,0,0,0,0,0])
y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ])
start_time = time.time()
for i in range(1000000):
distance.sqeuclidean(y1,y2)
print("--- %s seconds ---" % (time.time() - start_time))
---15.212640523910522 seconds---
start_time = time.time()
for i in range(1000000):
np.sum((y1-y2)**2)
print("--- %s seconds ---" % (time.time() - start_time))
---8.381187438964844--- seconds
I supposed that the Scipy is kind of optimized so it should be faster.
Any comments will be appreciated.
Upvotes: 1
Views: 509
Reputation: 12407
Here is a more comprehensive comparison (credit to @Divakar's benchit
package):
def m1(y1,y2):
return distance.sqeuclidean(y1,y2)
def m2(y1,y2):
return np.sum((y1-y2)**2)
in_ = {n:[np.random.rand(n), np.random.rand(n)] for n in [10,100,1000,10000,20000]}
scipy gets more efficient for larger arrays. For smaller arrays, the overhead of calling the function most likely outweighs its benefit. According to source, scipy calculates np.dot(y1-y2,y1-y2)
.
And if you want an even faster solution, use np.dot
directly without the overhead of extra lines and function calling:
def m3(y1,y2):
y_d = y1-y2
return np.dot(y_d,y_d)
Upvotes: 6