Jannik
Jannik

Reputation: 1015

Python: Dendogram with Scipy doesn´t work

I want to use the dendogram of scipy. I have the following data:

I have a list with seven different means. For example:

Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]

Each mean is calculate for a different user. For example:

X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]

My aim is to display the data described above with the help of a dendorgram.

I tried the following:

Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]

# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)

Z = linkage(Y)
# Plot the dendogram with the results above
dendrogram(Z, leaf_rotation=45., leaf_font_size=12. , show_contracted=True)
plt.style.use("seaborn-whitegrid")
plt.title("Dendogram to find clusters")
plt.ylabel("Distance")
plt.show()

But it says:

ValueError: Length n of condensed distance matrix 'y' must be a binomial coefficient, i.e.there must be a k such that (k \choose 2)=n)!

I already tried to convert my data into a matrix. With:

# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)

But that doesn´t work too!

Are there any suggestions?

Thanks :-)

Upvotes: 0

Views: 5557

Answers (2)

Warren Weckesser
Warren Weckesser

Reputation: 114781

The first argument of linkage is either an n x m array, representing n points in m-dimensional space, or a one-dimensional array containing the condensed distance matrix. These are two very different meanings! The first is the raw data, i.e. the observations. The second format assumes that you have already computed all the distances between your observations, and you are providing these distances to linkage, not the original points.

It looks like you want the first case (raw data), with m = 1. So you must reshape the input to have shape (n, 1).

Replace this:

Z = linkage(Y)

with:

Z = linkage(np.reshape(Y, (len(Y), 1)))

Upvotes: 11

Vikash Singh
Vikash Singh

Reputation: 14001

So you are using 7 observations in Y len(Y) = 7.

But as per documentation of Linkage, the number of observations len(Y) should be such that.

{n \choose 2} = len(Y)

which means

1/2 * (n -1) * n = len(Y)

so length of Y should be such that n is a valid integer.

Upvotes: 1

Related Questions