Reputation: 35
I defined features for the clustering with the help of KMeans:
x = df_1.iloc[:, np.r_[9:12,26:78]]
And run the code to get 6 clusters:
kmeans = KMeans(n_clusters = 6)
kmeans.fit(x)
Now I want in my initial dataset to have a column with number (df_1("new") =...) : 1 for group of data in cluster one, 2 for group of data in cluster two, etc.
how exactly do I do that?
thanks!
Upvotes: 0
Views: 932
Reputation: 11395
You seem to be looking for fit_predict(x)
(or fit(x).predict(x)
), which returns the cluster for each sample.
fit_predict(X, y=None, sample_weight=None)
Compute cluster centers and predict cluster index for each sample.
Convenience method; equivalent to calling fit(X) followed by predict(X).
So I suppose this would do:
df['cluster'] = kmeans.fit_predict(x)
Upvotes: 2