Reputation: 438
I want to check the optimal number of k using the elbow method. I'm not using the scikit-learn library. I have my k-means coded from scratch and now I'm having a difficult time figuring out how to code the elbow method in python. I'm a total beginner.
This is my k-means code:
def cluster_init(array, k):
initial_assgnm = np.append(np.arange(k), np.random.randint(0, k, size=(len(array))))[:len(array)]
np.random.shuffle(initial_assgnm)
zero_arr = np.zeros((len(initial_assgnm), 1))
for indx, cluster_assgnm in enumerate(initial_assgnm):
zero_arr[indx] = cluster_assgnm
upd_array = np.append(array, zero_arr, axis=1)
return upd_array
def kmeans(array, k):
cluster_array = cluster_init(array, k)
while True:
unique_clusters = np.unique(cluster_array[:, -1])
centroid_dictonary = {}
for cluster in unique_clusters:
centroid_dictonary[cluster] = np.mean(cluster_array[np.where(cluster_array[:, -1] == cluster)][:, :-1], axis=0)
start_array = np.copy(cluster_array)
for row in range(len(cluster_array)):
cluster_array[row, -1] = unique_clusters[np.argmin(
[np.linalg.norm(cluster_array[row, :-1] - centroid_dictonary.get(cluster)) for cluster in unique_clusters])]
if np.array_equal(cluster_array, start_array):
break
return centroid_dictonary
This is what I have tried for the elbow method:
cost = []
K= range(1,239)
for k in K :
KM = kmeans(x,k)
print(k)
KM.fit(x)
cost.append(KM.inertia_)
But I get the following error
KM.fit(x)
AttributeError: 'dict' object has no attribute 'fit'
Upvotes: 1
Views: 813
Reputation: 2129
If you want to compute the elbow values from scratch, you need to compute the inertia for the current clustering assigment. To do this, you can compute the sum of the particle inertias. The particle inertia from a data point is the distance from its current position, to the closest center. If you have a function that computes this for you (in scikit-learn this function corresponds to pairwise_distances_argmin_min
) you could do
labels, mindist = pairwise_distances_argmin_min(
X=X, Y=centers, metric='euclidean', metric_kwargs={'squared': True})
inertia = mindist.sum()
If you actually wanted to write this function what you would do is loop over every row x in X, find the minimum over all y in Y of dist(x,y), and this would be your inertia for x. This naive method of computing the particle inertias is O(nk), so you might consider using the library function instead.
Upvotes: 1