b0bz
b0bz

Reputation: 2200

Group a list of points/tuples

Given a list of tuples / points I'm trying to find out how I can group each tuples that is within' a given bound (distance). It's hard to explain, but the short code should explain what I mean... I simply can't find a solution, nor how to explain the problem properly.

EG:

TPL = [(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)]
Print GroupTPL(TPL, distance=1)
> [
>  [(2, 7), (2, 6)], 
>  [(6, 4), (7, 5)], 
>  [(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
> ]

All the things i've tried, thought up is junk.. So I see no reason to even consider sharing, hope you guys got some tips, and tricks.

Upvotes: 0

Views: 1324

Answers (3)

b0bz
b0bz

Reputation: 2200

Just to fill in an alternative, which is not by default faster than the Union-Find code given by musically-ut, but it's is simple to use with Cython and thereby achieving 3x speedup, yet there are cases where it's faster by default. It's not my work, it's something that was found here: https://github.com/MerlijnWajer/Simba/blob/master/Units/MMLCore/tpa.pas

Cython-code: (remove cdef int..., and int w, int h to use with Python)

def group_pts(pts, int w, int h):
  cdef int t1, t2, c, ec, tc, l

  l = len(pts)-1
  if (l < 0): return False
  result = [list() for i in range(l+1)]
  c = 0
  ec = 0
  while ((l - ec) >= 0):
    result[c].append(pts[0])
    pts[0] = pts[l - ec]
    ec += 1
    tc = 1
    t1 = 0
    while (t1 < tc):
      t2 = 0
      while (t2 <= (l - ec)):
        if (abs(result[c][t1][0] - pts[t2][0]) <= w) and \
           (abs(result[c][t1][1] - pts[t2][1]) <= h):

          result[c].append(pts[t2])
          pts[t2] = pts[l - ec]
          ec += 1
          tc += 1
          t2 -= 1
        t2 += 1
      t1 += 1
    c += 1

  return result[0:c]

This can probably be optimized a bit, but I've not take the time to do so. This also allows for duplicates, which Union-Find structure is not very happy about.

It could be interesting to use SciPy's kd-tree to handle this, that would without a doubt bring the speed up for larger datasets.

Upvotes: 0

musically_ut
musically_ut

Reputation: 34288

I am assuming that you intend Chebyshev distance when you wanted to cluster the points together.

In this case, the most straight forward way to do it would be by using a Union Find data structure.

Here is a implementation I have used:

class UnionFind:
    """Union-find data structure. Items must be hashable."""

    def __init__(self):
        """Create a new empty union-find structure."""
        self.weights = {}
        self.parents = {}

    def __getitem__(self, obj):
        """X[item] will return the token object of the set which contains `item`"""

        # check for previously unknown object
        if obj not in self.parents:
            self.parents[obj] = obj 
            self.weights[obj] = 1
            return obj 

        # find path of objects leading to the root
        path = [obj]
        root = self.parents[obj]
        while root != path[-1]:
            path.append(root)
            root = self.parents[root]

        # compress the path and return
        for ancestor in path:
            self.parents[ancestor] = root
        return root

    def union(self, obj1, obj2):
        """Merges sets containing obj1 and obj2."""
        roots = [self[obj1], self[obj2]]
        heavier = max([(self.weights[r],r) for r in roots])[1]
        for r in roots:
            if r != heavier:
                self.weights[heavier] += self.weights[r]
                self.parents[r] = heavier

Then writing the function groupTPL is easy:

def groupTPL(TPL, distance=1):
    U = UnionFind()

    for (i, x) in enumerate(TPL):
        for j in range(i + 1, len(TPL)):
            y = TPL[j]
            if max(abs(x[0] - y[0]), abs(x[1] - y[1])) <= distance:
                U.union(x, y)

    disjSets = {}
    for x in TPL:
        s = disjSets.get(U[x], set())
        s.add(x)
        disjSets[U[x]] = s

    return [list(x) for x in disjSets.values()]

Running it on your set produces:

>>> groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])
[
 [(2, 7), (2, 6)], 
 [(6, 4), (7, 5)], 
 [(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
]

However, this implementation, though simple, is still O(n^2). If the number of points grows very large, an efficient implementation would use k-d trees.

Upvotes: 3

namit
namit

Reputation: 6957

my answer is late; but this is short and working!!

from itertools import combinations

def groupTPL(inputlist):  
    ptdiff = lambda (p1,p2):(p1,p2,abs(p1[0]-p2[0])+ abs(p1[1]-p2[1]),sqrt((p2[1] - p1[1])**2 + (p2[0] - p1[0])**2 ))
    diffs=[ x for x in map(ptdiff, combinations(inputlist,2)) if x[2]==1 or x[3]==sqrt(2)]
    nk1=[]
    for x in diffs:
        if len(nk1)>0:
            for y in nk1:
                if x[0] in y or x[1] in y:
                    y.add(x[0])
                    y.add(x[1])
                else:
                    if set(x[0:2]) not in nk1:
                        nk1.append(set(x[0:2]))
        else:
            nk1.append(set(x[0:2]))
    return [list(x) for x in nk1]

print groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])

this will give output as::::

[[(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)], [(6, 4), (7, 5)], [(2, 7), (2, 6)]]

Upvotes: 1

Related Questions