Reputation: 1015
I want to use the dendogram of scipy. I have the following data:
I have a list with seven different means. For example:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
Each mean is calculate for a different user. For example:
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
My aim is to display the data described above with the help of a dendorgram.
I tried the following:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
Z = linkage(Y)
# Plot the dendogram with the results above
dendrogram(Z, leaf_rotation=45., leaf_font_size=12. , show_contracted=True)
plt.style.use("seaborn-whitegrid")
plt.title("Dendogram to find clusters")
plt.ylabel("Distance")
plt.show()
But it says:
ValueError: Length n of condensed distance matrix 'y' must be a binomial coefficient, i.e.there must be a k such that (k \choose 2)=n)!
I already tried to convert my data into a matrix. With:
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
But that doesn´t work too!
Are there any suggestions?
Thanks :-)
Upvotes: 0
Views: 5557
Reputation: 114781
The first argument of linkage
is either an n x m array, representing n points in m-dimensional space, or a one-dimensional array containing the condensed distance matrix. These are two very different meanings! The first is the raw data, i.e. the observations. The second format assumes that you have already computed all the distances between your observations, and you are providing these distances to linkage
, not the original points.
It looks like you want the first case (raw data), with m = 1. So you must reshape the input to have shape (n, 1).
Replace this:
Z = linkage(Y)
with:
Z = linkage(np.reshape(Y, (len(Y), 1)))
Upvotes: 11
Reputation: 14001
So you are using 7 observations in Y
len(Y) = 7.
But as per documentation of Linkage, the number of observations len(Y)
should be such that.
{n \choose 2} = len(Y)
which means
1/2 * (n -1) * n = len(Y)
so length of Y should be such that n is a valid integer.
Upvotes: 1