Reputation: 37
I have a dataframe with X and Y axis values
They don't have any labels
They look as shown below
X-COORDINATE | Y-COORDINATE |
---|---|
12 | 34 |
99 | 42 |
90 | 27 |
49 | 64 |
Is it possible to use KMeans for clustering the data?
How do I get the labels and plot the data on a graph for each cluster?
Upvotes: 0
Views: 600
Reputation: 193
Yes, you can use k-means even if you don't have labels because k-means is an unsupervised method, but...
First of all you need to scale your data, because k-means is a distance algorithm and using distances between data points to determine their similarity. More about that here. I found this tutorial for clustering very useful, you could start with that. It also describes how to plot your data first with silhouette or elbow plot to define perfect number of clusters.
It should look somewhat like that:
from sklearn.cluster import KMeans
kmeans_model = KMeans(n_clusters=n_clusters) # you can get n_clusters from silhouette/elbow plot or just try out different numbers
kmeans_model.fit(your_dataframe)
labels = kmeans_model.predict(your_dataframe)
print(labels)
K-Means is not always performing perfect, if you want to get better results, you could also try out other algorithms like DBSCAN, HDBSCAN, Agglomerative clustering.... It always depends on your data which one you should choose.
Upvotes: 2