Reputation: 1455
In Python: I have 2 arrays: lon, lat (more than 500 000 points).
lon = numpy.array()
lat = numpy.array()
flag = True
while flag:
lon1 = lon[:-1]
lon2 = lon[1:]
lat1 = lat[:-1]
lat2 = lat[1:]
'''distance'''
x = (lon2 - lon1)
y = (lat2 - lat1)
d = numpy.sqrt(x * x + y * y)
min = numpy.min(d)
if min < 0.015:
j = numpy.where(d == min)[0][0]
lon[j] = (lon[j] + lon[j + 1]) / 2
lat[j] = (lat[j] + lat[j + 1]) / 2
lon = numpy.delete(lon, j + 1)
lat = numpy.delete(lat, j + 1)
else:
flag = False
This code works very slowly!
Prompt please, how a code to do quicker?
I know there is scipy.weave. Prompt please, how to change a code using scipy.weave?
import scipy
from scipy import weave
codeC = """
???
"""
scipy.weave.inline(codeC, ['lon', 'lat'], compiler='gcc')
Thanks.
Update:
lon1 = lon[:-1]
lon2 = lon[1:]
lat1 = lat[:-1]
lat2 = lat[1:]
x = (lon2 - lon1)
y = (lat2 - lat1)
d = x * x + y * y
while True and d.size > 1:
j = np.argmin(d)
if d[j] > min_radius:
break
lon[j] = (lon[j] + lon[j + 1]) / 2
lat[j] = (lat[j] + lat[j + 1]) / 2
'''lon = np.delete(lon, j + 1)
lat = np.delete(lat, j + 1)
d = np.delete(d, j)'''
if j == (d.size - 1):
lon = lon[:j + 1]
lat = lat[:j + 1]
d = d[:j]
else:
lon = np.hstack((lon[:j + 1], lon[j + 2:]))
lat = np.hstack((lat[:j + 1], lat[j + 2:]))
d = np.hstack((d[:j], d[j + 1:]))
if j > 0:
x = lon[j] - lon[j - 1]
y = lat[j] - lat[j - 1]
d[j - 1] = x * x + y * y
if j < (d.size - 1):
x = lon[j + 1] - lon[j]
y = lat[j + 1] - lat[j]
d[j] = x * x + y * y
@ilmiacs, @Jaime, @Luke, thanks! It works much quicker!
But as a whole the code works very long at an array of 500 points...:(((
Upvotes: 2
Views: 194
Reputation: 2576
You are redoing the whole expensive calculation on each iteration in your loop. One iteration only removes one node and then when removing one element you copy the whole arrays and redo the distance calculation. then you search the next node.
Thus, your code is suboptimal as it runs at least O(N^2)
. However, for what you want to achieve, O(N log N)
or possibly O(N)
would be sufficient.
A simple python module without numpy should be fast enough if you rearrange the algorithm. Recipe:
Hope this helps.
EDIT:
An example implementation without recalculating the distance array would be this:
lon = numpy.array()
lat = numpy.array()
lon1 = lon[:-1]
lon2 = lon[1:]
lat1 = lat[:-1]
lat2 = lat[1:]
'''distance'''
x = (lon2 - lon1)
y = (lat2 - lat1)
d = x * x + y * y
while True:
min = numpy.min(d)
if min < 0.015 * 0.015:
j = numpy.where(d == min)[0][0]
else:
break
lon[j] = (lon[j] + lon[j + 1]) / 2
lat[j] = (lat[j] + lat[j + 1]) / 2
lon = numpy.delete(lon, j + 1) # <-- you spend most of the time here
lat = numpy.delete(lat, j + 1) # <-- and here
d = numpy.delete(d, j) # <-- and here
x = lon[j]-lon[j-1]
y = lat[j]-lat[j-1]
d[j-1] = x * x + y * y
x = lon[j+1]-lon[j]
y = lat[j+1]-lat[j]
d[j] = x * x + y * y
Although this will give you a boost, you are still O(N^2)
, because of the extensive copying. To avoid that, you'd need to store your arrays in a doubly linked list. Then removing an element would be of O(1)
, not O(N)
. With the linked list, your worst operation would be the finding of the minimum. This you could improve by introducing a sorted version of d
, where you also would keep track of the positions in d
. You would need to keep track of the changes though. But finally you would arrive at an algorithm that is O(N log N)
.
EDIT 2:
Comparing squared distances is not necessary, too, as pointed out by @Jaime. I have corrected the code above.
EDIT 3:
Over the weekend I was thinking more about the algorithm, which turns out to be a very interesting problem and fun to think about. I wanted to share my thoughts.
To get it work O(N log N)
is less trivial than I thought. I have suggested to keep a list sorted_d
and manage it. Each time a link is removed, we have to update two entries in sorted_d
. This means, we have to find the new positions of the two corresponding distances in sorted_d
. In a normal list, finding the position can be done by bisection in O(log N)
. However, as far as I understand it, bisection is not viable in a linked list and turns O(N)
. This is the problem. I would be pleased if anyone could prove me wrong and show up a way to to insert an element in a sorted linked list in O(log N)
. I personally don't see it.
There is an alternative to the linked list for sorted_d
, however. All one needs is a B-tree or a Patricia-tree, that works with floats as keys. Insertion in such a tree can be done in (quite fast) 'O(log N)`. However, I do not know of any standard implementations of such a tree. Standard implementations work on strings or integers. Therefore, we could achieve it if we had an ordered and collision free mapping from floats to ints. Such a mapping does not exist in general, but could be beyond doubt be constructed for any particular dataset. And the dataset in question is beyond doubt particular: There certainly is an upper bound and a precision.
Hope this helps. If there is further interest, I could also present my thoughts on an optimal data structure for this problem.
Upvotes: 5