Asa Ya
Asa Ya

Reputation: 69

Determining K-means cluster numbers in Python

I have a trajectory dataset saved in a *.csv file and I sorted it according to month. I mean, I splitted it into different files according to month. Number of records in each file is different. For example, in January I have 10 thousands records but in April I have five hundred thousands records.

I am going to perform k-mean clustering in python on each file. Could you please let me know how can I find or determine the best cluster number to initial K?

Thank you

Upvotes: 0

Views: 125

Answers (1)

Alex Metsai
Alex Metsai

Reputation: 1950

You can use the elbow method.

In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use. The same method can be used to choose the number of parameters in other data-driven models, such as the number of principal components to describe a data set.

Don't let the above description scare you, it's actually quite an easy thing to do. Here's a quick tutorial.

Upvotes: 1

Related Questions