GV-9wj
GV-9wj

Reputation: 37

How to use KMeans for distance clustering

I have a dataframe with X and Y axis values

They don't have any labels

They look as shown below

X-COORDINATE Y-COORDINATE
12 34
99 42
90 27
49 64

Is it possible to use KMeans for clustering the data?

How do I get the labels and plot the data on a graph for each cluster?

Upvotes: 0

Views: 600

Answers (1)

Kami
Kami

Reputation: 193

Yes, you can use k-means even if you don't have labels because k-means is an unsupervised method, but...

First of all you need to scale your data, because k-means is a distance algorithm and using distances between data points to determine their similarity. More about that here. I found this tutorial for clustering very useful, you could start with that. It also describes how to plot your data first with silhouette or elbow plot to define perfect number of clusters.

It should look somewhat like that:

from sklearn.cluster import KMeans

kmeans_model = KMeans(n_clusters=n_clusters) # you can get n_clusters from silhouette/elbow plot or just try out different numbers

kmeans_model.fit(your_dataframe)

labels = kmeans_model.predict(your_dataframe)

print(labels)

K-Means is not always performing perfect, if you want to get better results, you could also try out other algorithms like DBSCAN, HDBSCAN, Agglomerative clustering.... It always depends on your data which one you should choose.

Upvotes: 2

Related Questions