Scikit learn and scipy giving different results with Agglomerative clustering with euclidean metric

Question

I am trying to cluster timestamps for different speaker embedding, it works best if euclidean is used as affinity and ward as linkage on Agglomerative clustering with k clusters. I am trying to match the output using scipy's hierarchy.flcusterdata and tried different threshold, however I am getting completely random results.

However I can match the output from both algorithms if cosine is used as affinity and complete as linkage. What could be the reason behind skewed results on euclidean metric? Here is the code :

clt=AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='ward')
res = clt.fit_predict(embeddings)

res=hierarchy.fclusterdata(embeddings,t=0.95,criterion='distance',method='ward',metric='euclidean')

I am using the data for both clustering this way, assuming embedding is a numpy array of timestamps

Scikit learn and scipy giving different results with Agglomerative clustering with euclidean metric

Answers (1)

Related Questions