Reputation: 219
I am meeting some difficulties when drawing a dendrogram by create_dendrogram
in plotly.figure_factory
.
the default linkagefun (linkagefun
) is complete
and the default setting of distance function (distfun
) is scs.distance.pdist
but the setting I want is jaccard
for distfun
, and average
for linkagefun
:
the setting I want shows below:
import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as such
plt.figure(figsize = (10, 10))
disMat = sch.distance.pdist(df, metric='jaccard')
disMat1 = sch.distance.squareform(disMat)
Z=sch.linkage(disMat1,method='average')
Dend=sch.dendrogram(Z,orientation='right')
plt.tick_params(
axis='y',
which='both',
direction='in',
left=False,
right=False,
labelleft=False)
I noticed that the linkagefun
could be set by linkagefun=lambda x: sch.linkage(x, 'average')
, but the distfun
can't be set by distfun='jaccard'
, and I have no idea of how to set this function.
fig = create_dendrogram(df, orientation='left',
labels=df.index,
distfun='jaccard',
linkagefun=lambda x: sch.linkage(x, 'average'))
fig.show()
the example of the df set below:
import pandas as pd
df = pd.DataFrame({'1-7':[0,0,1,1,0,1,1],'1-2':[1,0,1,0,0,1,1],'2-3':[1,0,0,0,1,1,0],'2-2':[0,1,0,1,0,1,1],'1-1':[1,0,0,1,0,1,0],'1-3':[0,1,1,1,0,0,0],'1-5':[0,1,0,1,1,0,1]},index=['a','b','c','d','e','f','g'])
since I need Dash to plot the figure on the web page, it seems I have to use create_dendrogram
in plotly.
Upvotes: 0
Views: 675
Reputation: 36
You can use partial
from functools
to "freeze" the parameter of scipy.spatial.distance.pdist
that specifies the distance metric.
from functools import partial
from scipy.spatial.distance import pdist
pw_jaccard_func = partial(pdist, metric='jaccard')
Then use the partial function as the input for distfun
:
fig = create_dendrogram(df, orientation='left',
labels=df.index,
distfun=pw_jaccard_func ,
linkagefun=lambda x: sch.linkage(x, 'average'))
Upvotes: 2