Ayush Chordia
Ayush Chordia

Reputation: 289

Scikit learn and scipy giving different results with Agglomerative clustering with euclidean metric

I am trying to cluster timestamps for different speaker embedding, it works best if euclidean is used as affinity and ward as linkage on Agglomerative clustering with k clusters. I am trying to match the output using scipy's hierarchy.flcusterdata and tried different threshold, however I am getting completely random results.

However I can match the output from both algorithms if cosine is used as affinity and complete as linkage. What could be the reason behind skewed results on euclidean metric? Here is the code :

clt=AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='ward')
res = clt.fit_predict(embeddings)

res=hierarchy.fclusterdata(embeddings,t=0.95,criterion='distance',method='ward',metric='euclidean')

I am using the data for both clustering this way, assuming embedding is a numpy array of timestamps

Upvotes: 0

Views: 1483

Answers (1)

Ugurite
Ugurite

Reputation: 543

Try using criterion='maxclust' instead. This way you specify the number of clusters you want.

res = hierarchy.fclusterdata(embeddings, k, criterion='maxclust', method='ward', metric='euclidean')

Upvotes: 1

Related Questions