how make kmeans on specific columns?

Question

I would like to do a K-means on specific columns of my data set. As these are categorical data, I plan to do a onehot_encoding on it. Now I would like to know if it is possible to do K-means on specific columns and display the results (of a group for example) with all the columns?

For example i have col1, col2 and col3, K-means on col2 and col3which are onehot_encoded and display results with col1, col2 and col3. I hope I have clearly expressed my concern

PV8 · Accepted Answer

This follows the basic documentation of kmeans:

from sklearn.cluster import KMeans
#here you select your columns
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#this will give you the groups back
kmeans.predict(X)

So the kmeans predict command will give you the group back which you can add to your original data.

how make kmeans on specific columns?

Answers (1)

Related Questions