solub
solub

Reputation: 1363

Fastest way to compute distances between consecutive vectors with numpy/scipy

I have a line growth algorithm where I need to:

enter image description here

I usually do this in a very naive manner (see code below) and would like to know how to compute distances between consecutive vectors the fastest way possible using numpy (and scipy if needed).

import math


threshold = 10
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]

for i in xrange(len(vectorList)):
    p1 = vectorList[i]
    p2 = vectorList[i+1]
    d = math.sqrt((p2[0] - p1[0])**2 + (p2[1] - p1[1])**2)
    if d >= threshold:
        pmid = ((p1[0] + p2[0]) * .5, (p1[1] + p2[1]) * .5)
        vectorList.insert(i+1, pmid)

EDIT: I've come up with the following workaround but I'm still concerned with the distance computation.

I would need to compute the distances between a vector and its next neighbor in a list instead of calculating a whole distance matrix (all vectors against each others) as I'm doing here.

import numpy as np

vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
arr = np.asarray(vectorList).astype(float)


dis = distance.cdist(arr, arr).diagonal(1)
idx = np.where(dis > 10)[0]
vec = (arr[idx] + arr[idx+1]) * .5
arr = np.insert(arr, idx+1, vec, 0)

# output
array([[ 0. , 10. ],[ 4. ,  8. ],[ 9. , 11. ],[14. , 14. ],[16. , 19. ],[25.5, 17.5],[35. , 16. ]])

Upvotes: 1

Views: 1422

Answers (2)

dubbbdan
dubbbdan

Reputation: 2740

Here is an approach using sklearn NearestNeighbors

from sklearn.neighbors import NearestNeighbors
import numpy as np

def distance(p1,p2):
    return np.sqrt(np.sum(np.diff([p1,p2], axis=0)**2,1))

def find_shortest_distance(vectorList):
    nbrs = NearestNeighbors(n_neighbors=2, metric=distance).fit(vectorList)
    distances, indices = nbrs.kneighbors(vectorList)

    result = distances[:, 1]
    return result

vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]

which returns:

result
Out[57]: array([ 4.47213595,  4.47213595,  5.38516481,  5.38516481, 19.23538406])

This might be overkill for your problem, but this solution doesn't require the points to be in consecutive order.

Timings (slower than numpy):

%timeit find_shortest_distance(vectorList)
478 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 0

hpaulj
hpaulj

Reputation: 231395

In [209]: vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39,50)] 
In [210]: vectorList                                                            
Out[210]: [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39, 50)]

I added a point, making 3 possible insertion points.

In [212]: np.diff(vectorList, axis=0)                                           
Out[212]: 
array([[ 4, -2],
       [10,  6],
       [ 2,  5],
       [19, -3],
       [ 4, 34]])
In [213]: np.sum(np.diff(vectorList, axis=0)**2,1)                              
Out[213]: array([  20,  136,   29,  370, 1172])

distances:

In [214]: np.sqrt(np.sum(np.diff(vectorList, axis=0)**2,1))                     
Out[214]: array([ 4.47213595, 11.66190379,  5.38516481, 19.23538406, 34.23448554])

Mean values:

In [216]: arr = np.array(vectorList)                                            
In [217]: arr                                                                   
Out[217]: 
array([[ 0, 10],
       [ 4,  8],
       [14, 14],
       [16, 19],
       [35, 16],
       [39, 50]])

In [218]: (arr[:-1]+arr[1:])/2                                                  
Out[218]: 
array([[ 2. ,  9. ],
       [ 9. , 11. ],
       [15. , 16.5],
       [25.5, 17.5],
       [37. , 33. ]])

I could do something similar without diff:

d = np.sqrt(np.sum((arr[1:]-arr[:-1])**2,1)) 

The jumps that exceed the threshhold:

In [224]: idx = np.nonzero(d>10)                                                
In [225]: idx                                                                   
Out[225]: (array([1, 3, 4]),)
In [227]: _218[idx]       # the mean values to insert                                                            
Out[227]: 
array([[ 9. , 11. ],
       [25.5, 17.5],
       [37. , 33. ]])

Use np.insert to insert all values.

In [232]: np.insert(arr.astype(float), idx[0]+1, _227, axis=0)                  
Out[232]: 
array([[ 0. , 10. ],
       [ 4. ,  8. ],
       [ 9. , 11. ],
       [14. , 14. ],
       [16. , 19. ],
       [25.5, 17.5],
       [35. , 16. ],
       [37. , 33. ],
       [39. , 50. ]])

Upvotes: 5

Related Questions