Mr. Randy Tom
Mr. Randy Tom

Reputation: 321

Removing the 2d point that are close to each others

I would like to remove the coordinates that lie close to each other or if it is just a duplicate.

For example,

x = [[9, 169], [5, 164],[340,210],[1020,102],[210,312],[12,150]]

In the above list, the first and second element lies close to each other. How do I remove the second element while preserving the first one?

Following is what I have tried,

def process(input_list, thresh=(10, 10)):
    buffer = input_list.copy()
    n = 0
    prev_cx, prev_cy = 0, 0
    for i in range(len(input_list)):
        elem = input_list[i]
        cx, cy = elem
        if n == 0:
            prev_cx, prev_cy = cx, cy
        else:
            ab_cx, ab_cy = abs(prev_cx - cx), abs(prev_cy - cy)
            if ab_cx <= thresh[0] and ab_cy <= thresh[1]:
                del buffer[i]
        n += 1
    return buffer


x = [[9, 169], [5, 164], [340, 210], [1020, 102], [210, 312], [12, 150]]
processed = process(x)
print(processed)

The problem is that it doesn't recursively check if there are any other duplicates since it only checks the adjacent coordinates. What is an efficient way of filtering the coordinate?

Sample Input with thresh = (10,10):

x = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]

Sample output:

x = [[12,24],[100,1020], [121,214]]

Upvotes: 1

Views: 1506

Answers (2)

cadolphs
cadolphs

Reputation: 9617

I'd split this up a bit different. It's also tricky of course because of the way you have to modify the list.

def remove_close_neighbors(input_list, thresh, position):
  target_item = input_list[position]
  return [item for i, item in enumerate(input_list) if i == position or not is_close(target_item, item, thresh)]

This will remove all the "duplicate" (or close) points, other than the item under consideration.

(Then define is_close to check the threshold condition)

And then we can go over our items:

def process(input_list, thresh):
  pos = 0
  while pos < len(input_list):
    input_list = remove_close_neighbors(input_list, thresh, pos)
    pos += 1

This is by no means the most efficient way to achieve this. Depends on how scalable this needs to be for you. If we're talking "a bajillion points", you will need to look into clever data structures and algorithms. I think a tree structure could be good then, to group points "by sector", because then you don't have to compare each point to each other point all the time.

Upvotes: 1

Lucas Ng
Lucas Ng

Reputation: 661

Your question is a bit vague, but I'm taking it to mean:

  1. You want to compare all combinations of points
  2. If a combination contains points closer than a threshold
  3. Then remove the point further from the start of the input list

Try this:

import itertools

def process(input_list, threshold=(10,10)):
    combos = itertools.combinations(input_list, 2)
    points_to_remove = [point2
                        for point1, point2 in combos
                        if abs(point1[0]-point2[0])<=threshold[0] and abs(point1[1]-point2[1])<=threshold[1]]
    points_to_keep = [point for point in input_list if point not in points_to_remove]
    return points_to_keep

coords = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]
print(process(coords))
    
>>> [[12, 24], [5, 12], [100, 1020], [121, 214]]

The way this works is to generate all combinations of points using itertools (which leaves the points in the original order), and then create a list of points to remove using the threshold. Then it returns a list of points not in that list of points to remove.

You'll notice that I have an extra point than you. I simply copied what seemed to be your intended functionality (i.e. both dy AND dx <= thresh for point removal). However, if I change the line with the AND statement to remove point if dy OR dx <= thresh, I get the same output as your sample.

So I'm going to ask you to recheck your sample output.

BTW, it might be useful for you to confirm if checking for x and y proximity separately is what you really want. So as a bonus, I've included a version using the Euclidean distance as well:

import itertools
import math

def process(input_list, threshold=100):
    combos = itertools.combinations(input_list, 2)
    points_to_remove = [point2 for point1, point2 in combos if math.dist(point1, point2)<=threshold]
    points_to_keep = [point for point in input_list if point not in points_to_remove]
    return points_to_keep

coords = [[12,24], [5, 12],[100,1020], [20,30], [121,214], [15,12]]
print(process(coords))

>>> [[12, 24], [100, 1020], [121, 214]]

This version fits your original sample when I used a threshold radius of 100.

Upvotes: 2

Related Questions