Reputation: 357
I try to run this code:
import pandas as pd
import seaborn as sns
df = pd.DataFrame(clusters, columns=cols)
sns.clustermap(df, cmap="vlag", vmin=0, vmax=1, metric="correlation",
z_score=None, standard_scale=None, yticklabels=True,
figsize=(size, size))
The value of clusters is:
clusters = [[0.89463602, 0., 0., 0.85185185, 0.9023569, 0.,
0., 0.83333333, 0., 0., 0., ],
[0.75, 0.66666667, 0., 0., 0.69444444, 0.,
0.89272031, 0., 0.69444444, 0., 0.69444444,],
[0.85185185, 0.88910175, 0., 0., 0.9043771, 0.,
0., 0., 0.89092141, 0.77777778, 0.69444444,],
[0.75, 0.89825458, 0., 0., 0.77777778, 0.,
0.8908046, 0., 0.75, 0.91550069, 0.8, ],]
and I get the following error:
in linkage
linkage_wrap(N, X, Z, mthidx[method])
FloatingPointError: NaN dissimilarity value.
any ideas for what causes it?
Upvotes: 4
Views: 3454
Reputation: 46898
Two of your columns are all zeros, and have no variation at all, making it return nan with correlation:
cols = ["col"+str(i) for i in range(11)]
df = pd.DataFrame(clusters, columns=cols)
df.corr()
col0 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
col0 1.000000 -0.652805 NaN 0.755353 0.914034 NaN -0.971167 0.755353 -0.607892 -0.232318 -0.792705
col1 -0.652805 1.000000 NaN -0.967396 -0.353987 NaN 0.461102 -0.967396 0.982783 0.761192 0.976659
col2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
col3 0.755353 -0.967396 NaN 1.000000 0.537949 NaN -0.577350 1.000000 -0.978166 -0.573568 -0.990826
col4 0.914034 -0.353987 NaN 0.537949 1.000000 NaN -0.943651 0.537949 -0.352431 0.181392 -0.546475
col5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
col6 -0.971167 0.461102 NaN -0.577350 -0.943651 NaN 1.000000 -0.577350 0.401476 0.079648 0.627048
col7 0.755353 -0.967396 NaN 1.000000 0.537949 NaN -0.577350 1.000000 -0.978166 -0.573568 -0.990826
col8 -0.607892 0.982783 NaN -0.978166 -0.352431 NaN 0.401476 -0.978166 1.000000 0.665620 0.962359
col9 -0.232318 0.761192 NaN -0.573568 0.181392 NaN 0.079648 -0.573568 0.665620 1.000000 0.636492
col10 -0.792705 0.976659 NaN -0.990826 -0.546475 NaN 0.627048 -0.990826 0.962359 0.636492 1.000000
df[['col2','col5']]
col2 col5
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
You can either remove those columns and plot, or you have to use euclidean or canberra as metric.
Upvotes: 5