TheArcadeFire
TheArcadeFire

Reputation: 57

Optimize search of closest four elements in two 3D arrays

I have two numpy arrays filled with 3D coordinates (x, y, z). For each point of the first array (the "target" array) I need to find the 4 closest points of the 2nd array (the "source" array). I have no problem finding the actual results using different methods, but I want to speed up the process as much as I can.

I need this because I am working on a Maya tool that transfers information stored in each vertex of a mesh to a second mesh, and they might have different number of vertices.

At this point though, it becomes more of a python problem than a Maya one since my main bottleneck is the time spent looking for the vertex matches.

The number of elements can vary from a few hundreds to hundred of thousands, and I want to make sure that I find the best way to speed up the search. I would like my tool to be as quick as possible, since it might be used really often, and waiting minutes every single time it has to run would be quite annoying.

I found some useful answers that got me in the right direction:

Here I found out about KDTrees and different algorithms and here I found some useful considerations on multithreading.

Here is some code that simulates the kind of scenario I would be working with and a few of the solutions I tried.

import timeit
import numpy as np
from multiprocessing.pool import ThreadPool
from scipy import spatial

# brut Froce
def bruteForce():
    results = []
    for point in sources:
        dists = ((targets - [point]) ** 2).sum(axis=1)  # compute distances
        ndx = dists.argsort()  # indirect sort
        results.append(zip(ndx[:4], dists[ndx[:4]]))
    return results

# Thread Pool Implementation
def threaded():
    def worker(point):
        dists = ((targets - [point]) ** 2).sum(axis=1)  # compute distances
        ndx = dists.argsort()  # indirect sort
        return zip(ndx[:4], dists[ndx[:4]])


    pool = ThreadPool()
    return pool.map(worker, sources)

# KDTree implementation
def kdTree():
    tree = spatial.KDTree(targets, leafsize=50)
    return [tree.query(point, k=4) for point in sources]

# define the number of points for the two arrays
n_targets = 40000  
n_sources = 40000  

#pick some random points
targets = np.random.rand(n_targets, 3) * 100
sources = np.random.rand(n_sources, 3) * 100



print 'KDTree:   %s' % timeit.Timer(lambda: kdTree()).repeat(1, 1)[0]
print 'bruteforce:   %s' % timeit.Timer(lambda: bruteForce()).repeat(1, 1)[0]
print 'threaded:   %s' % timeit.Timer(lambda: threaded()).repeat(1, 1)[0]

My results are:

KDTree:       10.724864464  seconds
bruteforce:   211.427750433 seconds
threaded:     47.3280865123 seconds

The most promising method seems the KDTree. At first I thought that by using some Threads to split the work of the KDTree into separate tasks, I could speed up the process even more. However, after testing quickly using a basic threading.Thread implementation, it seemed to perform even worse when the KDTree was being computed in a Thread. Reading this scipy example I can see that KDTrees are not really suitable to be used in parallel Threads, but I did not really understood way.

I was wondering, then, if there is any other way I could optimize this code to perform quicker, maybe by using multiprocessing or some other kind of trick to parse through my arrays in parallel.

Thanks in advance for the help!

Upvotes: 0

Views: 837

Answers (2)

Michael Butscher
Michael Butscher

Reputation: 10959

For large enough numbers of source points multiprocessing may give a speed gain. A crucial point is that each subprocess must hold a copy of the KDTree. With Linux (supporting fork) this is done automatically if creating the subprocesses after tree was built.

For Windows the tree must either be sent pickled to the subprocesses as it is done automatically when sending parameters to the subprocess (which only seems to work for cKDTree but not for KDTree) or the tree must be created from scratch in each process.

Following code shows the pickling variant with multi process cKDTree versus single process.

import timeit
import numpy as np
from multiprocessing.pool import Pool
from scipy import spatial


# cKDTree implementation
def ckdTree():
    tree = spatial.cKDTree(targets, leafsize=50)
    return [tree.query(point, k=4) for point in sources]


# Initialization to transfer kdtree
def setKdTree(tree):
    global kdtree

    kdtree = tree

# Worker must not be in another function for multiprocessing
def multiprocKd_worker(point):
    return kdtree.query(point, k=4)


# cKDTree process pool implementation
def multiprocCKd():
    tree = spatial.cKDTree(targets, leafsize=50)

    pool = Pool(initializer=setKdTree, initargs=(tree,))
    return pool.map(multiprocKd_worker, sources)


if __name__ == "__main__":
    # define the number of points for the two arrays
    n_targets = 40000
    n_sources = 40000

    #pick some random points
    targets = np.random.rand(n_targets, 3) * 100
    sources = np.random.rand(n_sources, 3) * 100


    print('cKDTree:   %s' % timeit.Timer(lambda: ckdTree()).repeat(1, 1)[0])
    print('multiprocCKd:   %s' % timeit.Timer(lambda: multiprocCKd()).repeat(1, 1)[0])

Upvotes: 0

Paul Panzer
Paul Panzer

Reputation: 53029

There is one very simple but extremely effective thing you can do which is switching from KDTree to cKDTree. The latter being a Cython drop-in replacement of the first which is implemented in pure Python.

Also note that .query is vectorized, no need for a list comprehension.

import scipy.spatial as ss

a = np.random.random((40000,3))
b = np.random.random((40000,3))

tree_py = ss.KDTree(a)
tree_cy = ss.cKDTree(a)

timeit(lambda: tree_cy.query(b, k=4), number=10)*100
# 71.06744810007513
timeit(lambda: tree_py.query(b, k=4), number=1)*1000
# 13309.359921026044

So that is an almost 200x speedup for free.

Upvotes: 1

Related Questions