Ha An
Ha An

Reputation: 11

using Numpy for Kmean Clustering

I'm new in machine learning and want to build a Kmean algorithm with k = 2 and I'm struggling by calculate the new centroids. here is my code for kmeans:

def euclidean_distance(x: np.ndarray, y: np.ndarray):
   # x shape: (N1, D)
   # y shape: (N2, D)
   # output shape: (N1, N2)
    dist = []
    for i in x:
       for j in y:
        new_list = np.sqrt(sum((i - j) ** 2))
        dist.append(new_list)
    distance = np.reshape(dist, (len(x), len(y)))
    return distance

def kmeans(x, centroids, iterations=30):
    assignment = None
    for i in iterations:
        dist = euclidean_distance(x, centroids)
        assignment = np.argmin(dist, axis=1)

    for c in range(len(y)):
        centroids[c] = np.mean(x[assignment == c], 0) #error here
    
        return centroids, assignment

I have input x = [[1., 0.], [0., 1.], [0.5, 0.5]] and y = [[1., 0.], [0., 1.]] and distance is an array and look like that:

[[0.         1.41421356]
[1.41421356 0.         ]
[0.70710678 0.70710678]]

and when I run kmeans(x,y) then it returns error:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_40086/2170434798.py in 5 6 for c in range(len(y)):

----> 7 centroids[c] = (x[classes == c], 0) 8 print(centroids)

TypeError: only integer scalar arrays can be converted to a scalar index

Does anyone know how to fix it or improve my code? Thank you in advance!

Upvotes: 1

Views: 599

Answers (1)

Arav
Arav

Reputation: 111

Changing inputs to NumPy arrays should get rid of errors:

x = np.array([[1., 0.], [0., 1.], [0.5, 0.5]])
y = np.array([[1., 0.], [0., 1.]])

Also seems like you must change for i in iterations to for i in range(iterations) in kmeans function.

Upvotes: 1

Related Questions