Reputation: 41
I am trying to calculate pairwise euclidean distances between two sets of D-Dimensional Vectors and trying to optimize the runtime performance of the below naive implementation. Basically the below code returns the pairwise squared euclidean distance between each row in descriptors(AxD) with each row in codebook (BxD) returning a result (AxB) matrix.
By moving from python to cython based implementation i see a good improvement. I am new to cython and was wondering if this implementation could be further optimized to bring performance closer to pure c or java based implementation.
pairwise_dist.pyx
import numpy as np
cimport numpy as np
import math
import cython
np.import_array()
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False)
@cython.wraparound(False)
def _pdist( np.ndarray[DTYPE_t, ndim=2] codebook,
np.ndarray[DTYPE_t, ndim=2] descriptors,
np.ndarray[DTYPE_t, ndim=2] result ):
cdef int i, j, k
cdef int codebook_size = codebook.shape[0]
cdef int n_descriptors = descriptors.shape[0]
cdef int n_dimensions = codebook.shape[1]
cdef float dist, diff
for i in xrange(n_descriptors):
for j in xrange(codebook_size):
dist = 0.0
for k in xrange(n_dimensions):
diff = codebook[j,k] - descriptors[i,k]
dist = dist + diff*diff
result[i,j] = dist
test.py:
import time
import numpy
CODEBOOK_SIZE = 4096
CODEBOOK_DIM = 64
DESC_DIM = 64
DESC_SIZE = 4000
codebook = numpy.random.randint(256,size=(CODEBOOK_SIZE,CODEBOOK_DIM))
descriptors = numpy.random.randint(256,size=(DESC_SIZE,DESC_DIM))
codebook = codebook.astype(numpy.float64)
descriptors = descriptors.astype(numpy.float64)
result = numpy.zeros((descriptors.shape[0],codebook.shape[0]),codebook.dtype)
from pairwise_dist import _pdist
stime = time.time()
_pdist(codebook,descriptors,result)
etime = time.time()
print "Time taken: "+ str(stime-etime)
build.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name = "pairwise_dist",
ext_modules = cythonize('pairwise_dist.pyx'),
)
python build_cython.py build_ext --inplace
Upvotes: 1
Views: 1066
Reputation: 12765
Are you using Cython for educational reasons? Otherwise, I'd recommend using the scipy.spatial module
import numpy as np
from scipy.spatial.distance import pdist, squareform
object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]
# make a 4x3 matrix from list of objects
X = np.array(list_of_objects)
#calculate pairwise distances
distances = pdist(X)
#make a square matrix from result
distances_as_2d_matrix = squareform(distances)
print distances
print distances_as_2d_matrix
pdist
supports multiple metrics, euclidean being only one.
Upvotes: 1