yangj257
yangj257

Reputation: 21

why scipy.spatial.ckdtree runs slower than scipy.spatial.kdtree

Normally,scipy.spatial.ckdtree runs much faster than scipy.spatial.kdtree.

But in my case,scipy.spatial.ckdtree runs slower than scipy.spatial.kdtree. My code is as follows:

import numpy as np
from laspy.file import File
from scipy import spatial
from timeit import default_timer as timer
inFile = File("Toronto_Strip_01.las")
dataset = np.vstack([inFile.x, inFile.y, inFile.z]).transpose()
print(dataset.shape)
start=timer()
tree = spatial.cKDTree(dataset)
# balanced_tree = False
end=timer()
distance,index=tree.query(dataset[100,:],k=5)
print(distance,index)
print(end-start)

start=timer()
tree = spatial.KDTree(dataset)
end=timer()
dis,indices= tree.query(dataset[100,:],k=5)
print(dis,indices)
print(end-start)

dataset.shape is (2727891, 3),dataset.max() is 4834229.32

But, in a test case, scipy.spatial.ckdtree runs much faster than scipy.spatial.kdtree,the code is as follows:

import numpy as np
from timeit import default_timer as timer
from scipy import spatial
np.random.seed(0)
A = np.random.random((2000000,3))*2000000
start1 = timer()
kdt=spatial.KDTree(A)
end1 = timer()
distance,index = kdt.query(A[100,:],k=5)
print(distance,index)
print(end1-start1)

start2 = timer()
kdt = spatial.cKDTree(A)  # cKDTree + outside construction
end2 = timer()
distance,index = kdt.query(A[100,:],k=5)
print(distance,index)
print(end2-start2)

Here is my problem: in my code,Do I need to process the dataset to speedup the cKDTree?

my python version is 3.6.5,scipy version is 1.1.0,cython is 0.28.4

Upvotes: 2

Views: 977

Answers (1)

danwild
danwild

Reputation: 2046

Perhaps more of a long comment; but you should consider how the cKDTree parameters impact performance with your particular dataset.

Especially balanced_tree, and compact_nodes - as pointed out here.

Upvotes: 2

Related Questions