Josef
Josef

Reputation: 39

Python - Kmeans - Add the centroids as a new column

Assume I have the following dataframe. How can I create a new column "new_col" containing the centroids? I can only create the column with the labs, not with the centroids.

Here is my code.

from sklearn import preprocessing
from sklearn.cluster import KMeans

numbers = pd.DataFrame(list(range(1,1000)), columns = ['num'])

kmean_model = KMeans(n_clusters=5)
kmean_model.fit(numbers[['num']])

kmean_model.cluster_centers_
array([[699. ],
       [297. ],
       [497.5],
       [899.5],
       [ 99. ]])

numbers['new_col'] = kmean_model.predict(numbers[['num']])

Upvotes: 1

Views: 1070

Answers (1)

Gilseung Ahn
Gilseung Ahn

Reputation: 2614

It is simple. Just use .labels_ as follows.

numbers['new_col'] = kmean_model.labels_

Edit. Sorry my mistake.

Make dictionary whose key is label and value is centers, and replace the new_col using the dictionary. See the following.

label_center_dict = {k:v for k, v in zip(kmean_model.labels_, kmean_model.cluster_centers_)}
numbers['new_col'] = kmean_model.labels_
numbers['new_col'].replace(label_center_dict, inplace = True)

Upvotes: 1

Related Questions