bastoon
bastoon

Reputation: 1

How can I remove too close points in a list

I have a list of points with x,y coordinates:

List_coord=[(462, 435), (491, 953), (617, 285),(657, 378)]

This list lenght (4 element here) can be very large from few hundred up to 35000 elements.

I want to remove too close points by threshold in this list. note:Points are never at the exact same position.

My current code for that:

while iteration<5:
  for pt in List_coord:
    for PT in List_coord:
        if (abs(pt[0]-PT[0])+abs(pt[1]-PT[1]))!=0 and abs(pt[0]-PT[0])<threshold and abs(pt[1]-PT[1])<threshold:
         List_coord.remove(PT)
  iteration=iteration+1

Explication of my terrible code :) :

This code is working but it is a very low process!

I am sure there is another method much easier but i wasn't able to find even if some allready answered questions are close to mine..

note:I would like to avoid using extra library for that code if it is possible

Upvotes: 0

Views: 1645

Answers (1)

Cort Ammon
Cort Ammon

Reputation: 10863

Python will be a bit slow at this ;-)

The solution you will probably want is called quad-trees, but I'll mention a simpler approach first, in case it's preferable.

The usual approach is to group the points so that you can easily reject points that are clearly far away from each other.

One approach might be to sort the list twice, once by x once by y. You can prove that if two points are too-close, they must be close in one dimension or the other. Thus your inner loop can break out early. If it sees a point that is too far away from the outer point in the sorted direction, it can know for a fact that all future points in that list are also too far away. Thus it doesn't have to look any further. Do this in X and Y and you're set!

This approach is going to tend to be dominated by the O(n log n) sort times. However, if all of your points share a single x value, you'll end up doing the same slow O(n^2) iteration that you're doing right now because you never terminate the inner loop early.

The more robust solution is to use quadtrees. Quadtrees are designed to solve the kind of problem you are looking at. The idea is to build a tree such that you can rapidly exclude large numbers of points. I'd recommend this.

If your number of points gets too large, I'd recommend getting a clustering library. Efficient clustering is a very difficult task, and often done in C++ or another fast language.

Upvotes: 1

Related Questions