Kalin Stoyanov
Kalin Stoyanov

Reputation: 581

Wondering why scipy.spatial.distance.sqeuclidean is twice slower than numpy.sum((y1-y2)**2)

Here is my code

import numpy as np
import time
from scipy.spatial import distance

y1=np.array([0,0,0,0,1,0,0,0,0,0])
y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ])

start_time = time.time()
for i in range(1000000):
    distance.sqeuclidean(y1,y2)
print("--- %s seconds ---" % (time.time() - start_time))

---15.212640523910522 seconds---

start_time = time.time()
for i in range(1000000):
    np.sum((y1-y2)**2)
print("--- %s seconds ---" % (time.time() - start_time))

---8.381187438964844--- seconds

I supposed that the Scipy is kind of optimized so it should be faster.

Any comments will be appreciated.

Upvotes: 1

Views: 509

Answers (1)

Ehsan
Ehsan

Reputation: 12407

Here is a more comprehensive comparison (credit to @Divakar's benchit package):

def m1(y1,y2):
  return distance.sqeuclidean(y1,y2)

def m2(y1,y2):
  return np.sum((y1-y2)**2)

in_ = {n:[np.random.rand(n), np.random.rand(n)] for n in [10,100,1000,10000,20000]}

enter image description here

scipy gets more efficient for larger arrays. For smaller arrays, the overhead of calling the function most likely outweighs its benefit. According to source, scipy calculates np.dot(y1-y2,y1-y2).

And if you want an even faster solution, use np.dot directly without the overhead of extra lines and function calling:

def m3(y1,y2):
  y_d = y1-y2
  return np.dot(y_d,y_d)

enter image description here

Upvotes: 6

Related Questions