Reputation: 1363
I have a line growth algorithm where I need to:
I usually do this in a very naive manner (see code below) and would like to know how to compute distances between consecutive vectors the fastest way possible using numpy (and scipy if needed).
import math
threshold = 10
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
for i in xrange(len(vectorList)):
p1 = vectorList[i]
p2 = vectorList[i+1]
d = math.sqrt((p2[0] - p1[0])**2 + (p2[1] - p1[1])**2)
if d >= threshold:
pmid = ((p1[0] + p2[0]) * .5, (p1[1] + p2[1]) * .5)
vectorList.insert(i+1, pmid)
EDIT: I've come up with the following workaround but I'm still concerned with the distance computation.
I would need to compute the distances between a vector and its next neighbor in a list instead of calculating a whole distance matrix (all vectors against each others) as I'm doing here.
import numpy as np
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
arr = np.asarray(vectorList).astype(float)
dis = distance.cdist(arr, arr).diagonal(1)
idx = np.where(dis > 10)[0]
vec = (arr[idx] + arr[idx+1]) * .5
arr = np.insert(arr, idx+1, vec, 0)
# output
array([[ 0. , 10. ],[ 4. , 8. ],[ 9. , 11. ],[14. , 14. ],[16. , 19. ],[25.5, 17.5],[35. , 16. ]])
Upvotes: 1
Views: 1422
Reputation: 2740
Here is an approach using sklearn NearestNeighbors
from sklearn.neighbors import NearestNeighbors
import numpy as np
def distance(p1,p2):
return np.sqrt(np.sum(np.diff([p1,p2], axis=0)**2,1))
def find_shortest_distance(vectorList):
nbrs = NearestNeighbors(n_neighbors=2, metric=distance).fit(vectorList)
distances, indices = nbrs.kneighbors(vectorList)
result = distances[:, 1]
return result
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
which returns:
result
Out[57]: array([ 4.47213595, 4.47213595, 5.38516481, 5.38516481, 19.23538406])
This might be overkill for your problem, but this solution doesn't require the points to be in consecutive order.
Timings (slower than numpy):
%timeit find_shortest_distance(vectorList)
478 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 0
Reputation: 231395
In [209]: vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39,50)]
In [210]: vectorList
Out[210]: [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39, 50)]
I added a point, making 3 possible insertion points.
In [212]: np.diff(vectorList, axis=0)
Out[212]:
array([[ 4, -2],
[10, 6],
[ 2, 5],
[19, -3],
[ 4, 34]])
In [213]: np.sum(np.diff(vectorList, axis=0)**2,1)
Out[213]: array([ 20, 136, 29, 370, 1172])
distances:
In [214]: np.sqrt(np.sum(np.diff(vectorList, axis=0)**2,1))
Out[214]: array([ 4.47213595, 11.66190379, 5.38516481, 19.23538406, 34.23448554])
Mean values:
In [216]: arr = np.array(vectorList)
In [217]: arr
Out[217]:
array([[ 0, 10],
[ 4, 8],
[14, 14],
[16, 19],
[35, 16],
[39, 50]])
In [218]: (arr[:-1]+arr[1:])/2
Out[218]:
array([[ 2. , 9. ],
[ 9. , 11. ],
[15. , 16.5],
[25.5, 17.5],
[37. , 33. ]])
I could do something similar without diff
:
d = np.sqrt(np.sum((arr[1:]-arr[:-1])**2,1))
The jumps that exceed the threshhold:
In [224]: idx = np.nonzero(d>10)
In [225]: idx
Out[225]: (array([1, 3, 4]),)
In [227]: _218[idx] # the mean values to insert
Out[227]:
array([[ 9. , 11. ],
[25.5, 17.5],
[37. , 33. ]])
Use np.insert
to insert all values.
In [232]: np.insert(arr.astype(float), idx[0]+1, _227, axis=0)
Out[232]:
array([[ 0. , 10. ],
[ 4. , 8. ],
[ 9. , 11. ],
[14. , 14. ],
[16. , 19. ],
[25.5, 17.5],
[35. , 16. ],
[37. , 33. ],
[39. , 50. ]])
Upvotes: 5