When to stop agglomerative hierarchical clustering - stopping criteria

Question

I am coding my application each function so i am not using tools which does everything for you

Been looking for solution when to cut my agglomerative hierarchical clustering

How do i cluster?

I have coded application in c# 4.5.2

So far i am using standard hierarchical which uses Euclidean_Distance to calculate distance between document pairs

Also it uses UPGMA to calculate distance between clusters to decide merge which ones

I also coded Rand Index and F Measure to test my manually labeled data-set success

However the problem is when stop merging more clusters

I am really bad at understanding mathematical equations without real data example or a well explained pseudo code

There are mathematical equations everywhere but no real life example

So looking for your answers. For example it is written in many places Bayesian information criterion (BIC) is a good solution but i cant figure out how to apply it to my software

I also have other distance or similarity metrics such as cosine similarity or Sorensen Dice Distance etc

There are so many questions on stackexchange or stackoverflow about this but all answers are using tools

like matlab or R or etc

When to stop agglomerative hierarchical clustering - stopping criteria

Answers (1)

Related Questions