datasciencefordummies
datasciencefordummies

Reputation: 41

How can I cluster coordinate values into rows using their Y-axis value?

Currently I have a dataframe of X Y coordinates which represent some circles that have been detected in OpenCV Python. These circles form distinct rows and columns and I would like to cluster them Row by Row.

enter image description here

However, sometimes these coordinates will be rotated slightly like seen below. The rotation can be both clockwise and counterclockwise. enter image description here

What would be the simplest way to group these coordinates together row by row?

Here is a sample dataframe:

sample=pd.DataFrame({
 'X_center': {72: 0.098054,
  137: 0.112574,
  254: 0.14442,
  322: 0.113445,
  365: 0.113445,
  370: 0.188365,
  384: 0.158165,
  386: 0.173459,
  401: 0.040267,
  405: 0.128303,
  408: 0.128352,
  415: 0.174039,
  451: 0.187688,
  454: 0.159326,
  482: 0.158213,
  500: 0.024828,
  519: 0.010309,
  603: 0.08489,
  606: 0.188946,
  613: 0.128932,
  684: 0.114026,
  688: 0.141709,
  717: 0.172878,
  738: 0.143113,
  816: 0.054787,
  824: 0.157778,
  841: 0.187639,
  876: 0.069064,
  890: 0.128448,
  908: 0.024247,
  937: 0.186865,
  939: 0.083293,
  964: 0.069306,
  974: 0.098587,
  976: 0.158794,
  1035: 0.171474,
  1037: 0.084842,
  1097: 0.143016,
  1100: 0.159181,
  1106: 0.054835,
  1111: 0.173652,
  1189: 0.114413,
  1199: 0.113639,
  1209: 0.025312,
  1214: 0.084067,
  1283: 0.156326,
  1313: 0.127142,
  1447: 0.099313,
  1494: 0.142145,
  1535: 0.083922,
  1557: 0.174426,
  1580: 0.172733,
  1607: 0.114413,
  1618: 0.039009,
  1626: 0.055609,
  1820: 0.0997,
  1866: 0.043945,
  1877: 0.070322,
  1890: 0.084842,
  1909: 0.128448,
  1951: 0.173217,
  1952: 0.144275,
  1978: 0.052221,
  1988: 0.112235,
  2002: 0.127384,
  2063: 0.009825,
  2106: 0.129174,
  2113: 0.005033,
  2137: 0.158939,
  2182: 0.010357},
 'Y_center': {72: 0.118009,
  137: 0.101591,
  254: 0.197024,
  322: 0.118112,
  365: 0.150077,
  370: 0.148589,
  384: 0.117599,
  386: 0.148999,
  401: 0.199025,
  405: 0.117137,
  408: 0.13371,
  415: 0.180605,
  451: 0.116983,
  454: 0.196614,
  482: 0.13335,
  500: 0.060595,
  519: 0.198923,
  603: 0.18235,
  606: 0.1804,
  613: 0.165623,
  684: 0.165829,
  688: 0.054284,
  717: 0.117394,
  738: 0.118266,
  816: 0.182863,
  824: 0.101796,
  841: 0.085428,
  876: 0.150539,
  890: 0.149615,
  908: 0.038122,
  937: 0.053207,
  939: 0.118676,
  964: 0.166855,
  974: 0.150077,
  976: 0.149666,
  1035: 0.037917,
  1037: 0.166496,
  1097: 0.149359,
  1100: 0.165469,
  1106: 0.166496,
  1111: 0.164802,
  1189: 0.181632,
  1199: 0.133915,
  1209: 0.18312,
  1214: 0.134582,
  1283: 0.038019,
  1313: 0.102258,
  1447: 0.166034,
  1494: 0.086455,
  1535: 0.150128,
  1557: 0.196408,
  1580: 0.101539,
  1607: 0.197383,
  1618: 0.120062,
  1626: 0.198102,
  1820: 0.197435,
  1866: 0.038481,
  1877: 0.198102,
  1890: 0.197281,
  1909: 0.08589,
  1951: 0.133043,
  1952: 0.181683,
  1978: 0.087276,
  1988: 0.039251,
  2002: 0.054797,
  2063: 0.15136,
  2106: 0.197075,
  2113: 0.082555,
  2137: 0.181016,
  2182: 0.167317}}

Upvotes: 3

Views: 659

Answers (1)

saastn
saastn

Reputation: 6025

It's too late and you should probably have found a solution by now. But I hope my answer is useful for you.

If by "rotated slightly" you mean to the extent that happened in your example, no worries, even k-means can handle it well. I used the silhouette score to find the number of clusters and the result seems correct:

import math
from sklearn.cluster import KMeans
from sklearn import metrics
from getSample import getSample

x, y = getSample() # gets coordinates of points in numpy arrays
maxScore = -math.inf
for k in range(2, 21):
    model = KMeans(n_clusters=k)
    C = model.fit(y.reshape(-1, 1))
    score = metrics.silhouette_score(
        y.reshape(-1, 1), C.labels_.astype(float), metric='euclidean')
    if score > maxScore:
        maxScore = score
        bestC = C
print(bestC.n_clusters)
print(bestC.cluster_centers_) # y-intercept of horizontal lines, each representing a layer
10  
[0.038358   0.05572075 0.0855208  0.101796   0.11802644 0.13372  
 0.1498409  0.16610233 0.18170863 0.19757927]

enter image description here

But the problem arises when the rotation angle is so high that the layers overlap in the y span. In such a case, we must first determine the angle of rotation. I suggest such an algorithm:

  1. Find all 2-combinations of the points in the set
  2. Calculate the vector between each pair of points
  3. Keep only the vectors whose x-component is greater than their y-component
  4. Sort the vectors by their magnitude
  5. Select twenty percent of the total number of points from the smallest vectors
  6. Reverse the vectors whose x-component is negative
  7. Find the average angle of the vectors from x-axis

This way, a number of the closest pair of points to each other, which are on the left and right sides of each other, are found and a vector is formed from the left point in the direction of the right point. These vectors are most likely to be in the same direction and show the direction of the layers.

x, y = getSample(rotation=a)

iPairs = range(len(x))
pairs = np.array(list(itertools.combinations(iPairs, 2)))

vx = x[pairs[:, 0]]-x[pairs[:, 1]]
vy = y[pairs[:, 0]]-y[pairs[:, 1]]

hClose = np.abs(vx) > np.abs(vy)
vx = vx[hClose]
vy = vy[hClose]

mag = np.sqrt(np.square(vx) + np.square(vy))
iClosest = np.argsort(mag)[:int(len(x)*.2)]
vx = vx[iClosest]
vy = vy[iClosest]

iFlip = vx<0.0
vx[iFlip] = -vx[iFlip]
vy[iFlip] = -vy[iFlip]

layerSlope = np.mean(vy) / np.mean(vx)

a2 = math.atan2(np.mean(vy), np.mean(vx))
print("Error: %.1f°"%(math.degrees(abs(a-a2))))

If, contrary to the provided example, the distance between the points of a layer may be unequal, the vectors must be normalized before averaging.

After finding the slope of the layers, the rest of the work is the same as in the case without rotation. The only difference is that this time instead of clustering on the y-component of the points, clustering will be done on their vertical distance from the line that passes the origin and whose slope is equal to the found value.

y2 = y - layerSlope * x

enter image description here

Upvotes: 2

Related Questions