Joey
Joey

Reputation: 1349

Sparse Matrix as input to Hierarchical clustering in R

I have a question about clustering using a distance matrix, but sparse.

Is there a sparse distance object format that does not expand the matrix and can work with the sparse representation?

Currently I'm doing the following

# read sparse matrix
sparse <- readMM('sparse-matrix')
distance <- as.dist(sparse)

sparse-matrix is already the correct distance matrix, which has NA's for entries that are not connected.

>sparse
[1,] . . .
[2,] 1 . .
[3,] 1 . .

> as.dist(sparse)
1 2
2 1  
3 1 0

But converting it with as.dist fails with

Error in asMethod(object) : negative length vectors are not allowed

Presumably, because it expands the matrix to a complete form. The matrix (NxN) size is N = 49281 This format is needed (dist object) by for example the hclust method

Similar Question without any answer on the R help list

Upvotes: 3

Views: 1622

Answers (1)

cbeleites
cbeleites

Reputation: 14093

How would a distance matrix be sparse? There is a distance between each two objects, so it is actually a very dense matrix. However, a triangular matrix is sufficient to describe the mutual distances (as D = D'). This is actually the case for the objects produced by dist.

If the distance matrix is sparse because lots of objects are the same, then maybe you'd want to calculate the distance matrix only on unique objects.

Upvotes: -5

Related Questions