Reputation: 11
I'm new in machine learning and want to build a Kmean algorithm with k = 2 and I'm struggling by calculate the new centroids. here is my code for kmeans:
def euclidean_distance(x: np.ndarray, y: np.ndarray):
# x shape: (N1, D)
# y shape: (N2, D)
# output shape: (N1, N2)
dist = []
for i in x:
for j in y:
new_list = np.sqrt(sum((i - j) ** 2))
dist.append(new_list)
distance = np.reshape(dist, (len(x), len(y)))
return distance
def kmeans(x, centroids, iterations=30):
assignment = None
for i in iterations:
dist = euclidean_distance(x, centroids)
assignment = np.argmin(dist, axis=1)
for c in range(len(y)):
centroids[c] = np.mean(x[assignment == c], 0) #error here
return centroids, assignment
I have input x = [[1., 0.], [0., 1.], [0.5, 0.5]]
and y = [[1., 0.], [0., 1.]]
and
distance
is an array and look like that:
[[0. 1.41421356]
[1.41421356 0. ]
[0.70710678 0.70710678]]
and when I run kmeans(x,y)
then it returns error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_40086/2170434798.py in 5 6 for c in range(len(y)):
----> 7 centroids[c] = (x[classes == c], 0) 8 print(centroids)
TypeError: only integer scalar arrays can be converted to a scalar index
Does anyone know how to fix it or improve my code? Thank you in advance!
Upvotes: 1
Views: 599
Reputation: 111
Changing inputs to NumPy arrays should get rid of errors:
x = np.array([[1., 0.], [0., 1.], [0.5, 0.5]])
y = np.array([[1., 0.], [0., 1.]])
Also seems like you must change for i in iterations
to for i in range(iterations)
in kmeans
function.
Upvotes: 1