Reputation: 39
Assume I have the following dataframe. How can I create a new column "new_col" containing the centroids? I can only create the column with the labs, not with the centroids.
Here is my code.
from sklearn import preprocessing
from sklearn.cluster import KMeans
numbers = pd.DataFrame(list(range(1,1000)), columns = ['num'])
kmean_model = KMeans(n_clusters=5)
kmean_model.fit(numbers[['num']])
kmean_model.cluster_centers_
array([[699. ],
[297. ],
[497.5],
[899.5],
[ 99. ]])
numbers['new_col'] = kmean_model.predict(numbers[['num']])
Upvotes: 1
Views: 1070
Reputation: 2614
It is simple. Just use .labels_
as follows.
numbers['new_col'] = kmean_model.labels_
Edit. Sorry my mistake.
Make dictionary whose key is label and value is centers, and replace the new_col using the dictionary. See the following.
label_center_dict = {k:v for k, v in zip(kmean_model.labels_, kmean_model.cluster_centers_)}
numbers['new_col'] = kmean_model.labels_
numbers['new_col'].replace(label_center_dict, inplace = True)
Upvotes: 1