user3496060
user3496060

Reputation: 856

"fast ward" clustering in Python

In JMP software there is an option to use the "fast Ward" method when the number of rows is greater than 2000. From the documentation [fast ward]:

"Applies an algorithm that computes Ward's method more quickly for large numbers of rows. The computation time is shorter because this algorithm does not require the calculation of a distance matrix. It is used automatically whenever there are more than 2,000 rows."

Matlab does the same thing.... "Find a maximum of four clusters in a hierarchical cluster tree created using the ward linkage method. Specify 'SaveMemory' as 'on' to construct clusters without computing the distance matrix. Otherwise, you can receive an out-of-memory error if your machine does not have enough memory to hold the distance matrix."

I'm looking for something similar in Python but they all seem to require the distance matrix calculated ahead of time (which requires absurd amounts of memory for my problem of 275k rows and 10 columns). In JMP/Matlab though it works just fine on a machine with half the memory of the machine I want to run the python script on. Anybody know of something?

Upvotes: 0

Views: 459

Answers (2)

Greg
Greg

Reputation: 1991

Have you worked with fastcluster? It has the option for "hierarchical clusters from distance matrices or from vector data"

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295520

From a now-rolled-back edit to the question by the OP:

I found that using the "linkage_vector" option seems to be what i was looking for. I was thrown off because "vector" to me meant 1D, but I guess it can be N-D.

Upvotes: 1

Related Questions