KMeans Clustering: adding results to an initial dataset

Question

I defined features for the clustering with the help of KMeans:

x = df_1.iloc[:, np.r_[9:12,26:78]]

And run the code to get 6 clusters:

kmeans = KMeans(n_clusters = 6)
kmeans.fit(x)

Now I want in my initial dataset to have a column with number (df_1("new") =...) : 1 for group of data in cluster one, 2 for group of data in cluster two, etc.

how exactly do I do that?

thanks!

Cimbali · Accepted Answer

You seem to be looking for fit_predict(x) (or fit(x).predict(x)), which returns the cluster for each sample.

fit_predict(X, y=None, sample_weight=None)
Compute cluster centers and predict cluster index for each sample.
Convenience method; equivalent to calling fit(X) followed by predict(X).

So I suppose this would do:

df['cluster'] = kmeans.fit_predict(x)

KMeans Clustering: adding results to an initial dataset

Answers (1)

Related Questions