What is the best method for optimizing runtime of kd-tree in Python?

Question

I'm currently using Python's scipy.spatial.kdtree to perform nearest neighbor lookups between two large sets of earth science data. One is a collection of storm reports that have a specific lat/lon attached; the other is 1x1 km gridded data containing land use data for half of the United States.

I've performed kd-tree operations on similar datasets which had roughly 4.4 * 10 ^ 7 points to sort in the kd-tree and that successfully sorted in approximately 160 seconds; however, when I try to build a kd-tree with this dataset (has approximately 1.6 * 10 ^ 8 points to sort), my kernel simply times out. I'm aware that a kd-tree runs at Olog(n) runtime, though I'm not too familiar with the finer workings of big-O notation, so I'm unsure as to whether or not this should cause an exponential increase in runtime.

Is this likely due to machine timeouts that could be optimized through better data partitioning prior to building the kd-tree, or does this seem to be somewhat of a fluke?

Thanks in advance!

What is the best method for optimizing runtime of kd-tree in Python?

Answers (1)

Related Questions