Reputation: 41
Currently I have a dataframe of X Y coordinates which represent some circles that have been detected in OpenCV Python. These circles form distinct rows and columns and I would like to cluster them Row by Row.
However, sometimes these coordinates will be rotated slightly like seen below. The rotation can be both clockwise and counterclockwise.
What would be the simplest way to group these coordinates together row by row?
Here is a sample dataframe:
sample=pd.DataFrame({
'X_center': {72: 0.098054,
137: 0.112574,
254: 0.14442,
322: 0.113445,
365: 0.113445,
370: 0.188365,
384: 0.158165,
386: 0.173459,
401: 0.040267,
405: 0.128303,
408: 0.128352,
415: 0.174039,
451: 0.187688,
454: 0.159326,
482: 0.158213,
500: 0.024828,
519: 0.010309,
603: 0.08489,
606: 0.188946,
613: 0.128932,
684: 0.114026,
688: 0.141709,
717: 0.172878,
738: 0.143113,
816: 0.054787,
824: 0.157778,
841: 0.187639,
876: 0.069064,
890: 0.128448,
908: 0.024247,
937: 0.186865,
939: 0.083293,
964: 0.069306,
974: 0.098587,
976: 0.158794,
1035: 0.171474,
1037: 0.084842,
1097: 0.143016,
1100: 0.159181,
1106: 0.054835,
1111: 0.173652,
1189: 0.114413,
1199: 0.113639,
1209: 0.025312,
1214: 0.084067,
1283: 0.156326,
1313: 0.127142,
1447: 0.099313,
1494: 0.142145,
1535: 0.083922,
1557: 0.174426,
1580: 0.172733,
1607: 0.114413,
1618: 0.039009,
1626: 0.055609,
1820: 0.0997,
1866: 0.043945,
1877: 0.070322,
1890: 0.084842,
1909: 0.128448,
1951: 0.173217,
1952: 0.144275,
1978: 0.052221,
1988: 0.112235,
2002: 0.127384,
2063: 0.009825,
2106: 0.129174,
2113: 0.005033,
2137: 0.158939,
2182: 0.010357},
'Y_center': {72: 0.118009,
137: 0.101591,
254: 0.197024,
322: 0.118112,
365: 0.150077,
370: 0.148589,
384: 0.117599,
386: 0.148999,
401: 0.199025,
405: 0.117137,
408: 0.13371,
415: 0.180605,
451: 0.116983,
454: 0.196614,
482: 0.13335,
500: 0.060595,
519: 0.198923,
603: 0.18235,
606: 0.1804,
613: 0.165623,
684: 0.165829,
688: 0.054284,
717: 0.117394,
738: 0.118266,
816: 0.182863,
824: 0.101796,
841: 0.085428,
876: 0.150539,
890: 0.149615,
908: 0.038122,
937: 0.053207,
939: 0.118676,
964: 0.166855,
974: 0.150077,
976: 0.149666,
1035: 0.037917,
1037: 0.166496,
1097: 0.149359,
1100: 0.165469,
1106: 0.166496,
1111: 0.164802,
1189: 0.181632,
1199: 0.133915,
1209: 0.18312,
1214: 0.134582,
1283: 0.038019,
1313: 0.102258,
1447: 0.166034,
1494: 0.086455,
1535: 0.150128,
1557: 0.196408,
1580: 0.101539,
1607: 0.197383,
1618: 0.120062,
1626: 0.198102,
1820: 0.197435,
1866: 0.038481,
1877: 0.198102,
1890: 0.197281,
1909: 0.08589,
1951: 0.133043,
1952: 0.181683,
1978: 0.087276,
1988: 0.039251,
2002: 0.054797,
2063: 0.15136,
2106: 0.197075,
2113: 0.082555,
2137: 0.181016,
2182: 0.167317}}
Upvotes: 3
Views: 659
Reputation: 6025
It's too late and you should probably have found a solution by now. But I hope my answer is useful for you.
If by "rotated slightly" you mean to the extent that happened in your example, no worries, even k-means can handle it well. I used the silhouette score to find the number of clusters and the result seems correct:
import math
from sklearn.cluster import KMeans
from sklearn import metrics
from getSample import getSample
x, y = getSample() # gets coordinates of points in numpy arrays
maxScore = -math.inf
for k in range(2, 21):
model = KMeans(n_clusters=k)
C = model.fit(y.reshape(-1, 1))
score = metrics.silhouette_score(
y.reshape(-1, 1), C.labels_.astype(float), metric='euclidean')
if score > maxScore:
maxScore = score
bestC = C
print(bestC.n_clusters)
print(bestC.cluster_centers_) # y-intercept of horizontal lines, each representing a layer
10
[0.038358 0.05572075 0.0855208 0.101796 0.11802644 0.13372
0.1498409 0.16610233 0.18170863 0.19757927]
But the problem arises when the rotation angle is so high that the layers overlap in the y span. In such a case, we must first determine the angle of rotation. I suggest such an algorithm:
This way, a number of the closest pair of points to each other, which are on the left and right sides of each other, are found and a vector is formed from the left point in the direction of the right point. These vectors are most likely to be in the same direction and show the direction of the layers.
x, y = getSample(rotation=a)
iPairs = range(len(x))
pairs = np.array(list(itertools.combinations(iPairs, 2)))
vx = x[pairs[:, 0]]-x[pairs[:, 1]]
vy = y[pairs[:, 0]]-y[pairs[:, 1]]
hClose = np.abs(vx) > np.abs(vy)
vx = vx[hClose]
vy = vy[hClose]
mag = np.sqrt(np.square(vx) + np.square(vy))
iClosest = np.argsort(mag)[:int(len(x)*.2)]
vx = vx[iClosest]
vy = vy[iClosest]
iFlip = vx<0.0
vx[iFlip] = -vx[iFlip]
vy[iFlip] = -vy[iFlip]
layerSlope = np.mean(vy) / np.mean(vx)
a2 = math.atan2(np.mean(vy), np.mean(vx))
print("Error: %.1f°"%(math.degrees(abs(a-a2))))
If, contrary to the provided example, the distance between the points of a layer may be unequal, the vectors must be normalized before averaging.
After finding the slope of the layers, the rest of the work is the same as in the case without rotation. The only difference is that this time instead of clustering on the y-component of the points, clustering will be done on their vertical distance from the line that passes the origin and whose slope is equal to the found value.
y2 = y - layerSlope * x
Upvotes: 2