Reputation: 2200
Given a list of tuples / points I'm trying to find out how I can group each tuples that is within' a given bound (distance). It's hard to explain, but the short code should explain what I mean... I simply can't find a solution, nor how to explain the problem properly.
EG:
TPL = [(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)]
Print GroupTPL(TPL, distance=1)
> [
> [(2, 7), (2, 6)],
> [(6, 4), (7, 5)],
> [(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
> ]
All the things i've tried, thought up is junk.. So I see no reason to even consider sharing, hope you guys got some tips, and tricks.
Upvotes: 0
Views: 1324
Reputation: 2200
Just to fill in an alternative, which is not by default faster than the Union-Find code given by musically-ut
, but it's is simple to use with Cython
and thereby achieving 3x speedup, yet there are cases where it's faster by default. It's not my work, it's something that was found here: https://github.com/MerlijnWajer/Simba/blob/master/Units/MMLCore/tpa.pas
Cython-code: (remove cdef int..., and int w, int h to use with Python)
def group_pts(pts, int w, int h):
cdef int t1, t2, c, ec, tc, l
l = len(pts)-1
if (l < 0): return False
result = [list() for i in range(l+1)]
c = 0
ec = 0
while ((l - ec) >= 0):
result[c].append(pts[0])
pts[0] = pts[l - ec]
ec += 1
tc = 1
t1 = 0
while (t1 < tc):
t2 = 0
while (t2 <= (l - ec)):
if (abs(result[c][t1][0] - pts[t2][0]) <= w) and \
(abs(result[c][t1][1] - pts[t2][1]) <= h):
result[c].append(pts[t2])
pts[t2] = pts[l - ec]
ec += 1
tc += 1
t2 -= 1
t2 += 1
t1 += 1
c += 1
return result[0:c]
This can probably be optimized a bit, but I've not take the time to do so. This also allows for duplicates, which Union-Find structure is not very happy about.
It could be interesting to use SciPy's kd-tree to handle this, that would without a doubt bring the speed up for larger datasets.
Upvotes: 0
Reputation: 34288
I am assuming that you intend Chebyshev distance when you wanted to cluster the points together.
In this case, the most straight forward way to do it would be by using a Union Find data structure.
Here is a implementation I have used:
class UnionFind:
"""Union-find data structure. Items must be hashable."""
def __init__(self):
"""Create a new empty union-find structure."""
self.weights = {}
self.parents = {}
def __getitem__(self, obj):
"""X[item] will return the token object of the set which contains `item`"""
# check for previously unknown object
if obj not in self.parents:
self.parents[obj] = obj
self.weights[obj] = 1
return obj
# find path of objects leading to the root
path = [obj]
root = self.parents[obj]
while root != path[-1]:
path.append(root)
root = self.parents[root]
# compress the path and return
for ancestor in path:
self.parents[ancestor] = root
return root
def union(self, obj1, obj2):
"""Merges sets containing obj1 and obj2."""
roots = [self[obj1], self[obj2]]
heavier = max([(self.weights[r],r) for r in roots])[1]
for r in roots:
if r != heavier:
self.weights[heavier] += self.weights[r]
self.parents[r] = heavier
Then writing the function groupTPL
is easy:
def groupTPL(TPL, distance=1):
U = UnionFind()
for (i, x) in enumerate(TPL):
for j in range(i + 1, len(TPL)):
y = TPL[j]
if max(abs(x[0] - y[0]), abs(x[1] - y[1])) <= distance:
U.union(x, y)
disjSets = {}
for x in TPL:
s = disjSets.get(U[x], set())
s.add(x)
disjSets[U[x]] = s
return [list(x) for x in disjSets.values()]
Running it on your set produces:
>>> groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])
[
[(2, 7), (2, 6)],
[(6, 4), (7, 5)],
[(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
]
However, this implementation, though simple, is still O(n^2)
. If the number of points grows very large, an efficient implementation would use k-d trees.
Upvotes: 3
Reputation: 6957
my answer is late; but this is short and working!!
from itertools import combinations
def groupTPL(inputlist):
ptdiff = lambda (p1,p2):(p1,p2,abs(p1[0]-p2[0])+ abs(p1[1]-p2[1]),sqrt((p2[1] - p1[1])**2 + (p2[0] - p1[0])**2 ))
diffs=[ x for x in map(ptdiff, combinations(inputlist,2)) if x[2]==1 or x[3]==sqrt(2)]
nk1=[]
for x in diffs:
if len(nk1)>0:
for y in nk1:
if x[0] in y or x[1] in y:
y.add(x[0])
y.add(x[1])
else:
if set(x[0:2]) not in nk1:
nk1.append(set(x[0:2]))
else:
nk1.append(set(x[0:2]))
return [list(x) for x in nk1]
print groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])
this will give output as::::
[[(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)], [(6, 4), (7, 5)], [(2, 7), (2, 6)]]
Upvotes: 1