Reputation: 289
I am trying to cluster timestamps for different speaker embedding, it works best if euclidean is used as affinity and ward as linkage on Agglomerative clustering with k clusters. I am trying to match the output using scipy's hierarchy.flcusterdata and tried different threshold, however I am getting completely random results.
However I can match the output from both algorithms if cosine is used as affinity and complete as linkage. What could be the reason behind skewed results on euclidean metric? Here is the code :
clt=AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='ward')
res = clt.fit_predict(embeddings)
res=hierarchy.fclusterdata(embeddings,t=0.95,criterion='distance',method='ward',metric='euclidean')
I am using the data for both clustering this way, assuming embedding is a numpy array of timestamps
Upvotes: 0
Views: 1483
Reputation: 543
Try using criterion='maxclust' instead. This way you specify the number of clusters you want.
res = hierarchy.fclusterdata(embeddings, k, criterion='maxclust', method='ward', metric='euclidean')
Upvotes: 1