Reputation: 4640
After doing clustering I end up with an object which stores all the cluster labels, something like this:
clusterer.labels_
The above is typically a list or an array. Then I always assign the labels to the original pandas dataframe (dataset) like this:
df['cluster_lables] = cluster.labels_
At the end I assume that each element of cluster.labels_
corresponds to each row to my original dataset, is that assumption correct? For example from the above column creation I end up with something like this:
ColA ColB cluster_labels
1 3 -1
2 4 2
...
89 90 45
Upvotes: 1
Views: 380
Reputation: 21
Broadly yes, you are right. The type of clustering I have used before is the KMeans clustering (which can be found here https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) but I can't guarantee they all work like that. Appending a new column onto the dataframe will work as you think it will.
Upvotes: 1