Reputation: 1349
I have a question about clustering using a distance matrix, but sparse.
Is there a sparse distance object format that does not expand the matrix and can work with the sparse representation?
Currently I'm doing the following
# read sparse matrix
sparse <- readMM('sparse-matrix')
distance <- as.dist(sparse)
sparse-matrix is already the correct distance matrix, which has NA's for entries that are not connected.
>sparse
[1,] . . .
[2,] 1 . .
[3,] 1 . .
> as.dist(sparse)
1 2
2 1
3 1 0
But converting it with as.dist fails with
Error in asMethod(object) : negative length vectors are not allowed
Presumably, because it expands the matrix to a complete form. The matrix (NxN) size is N = 49281 This format is needed (dist object) by for example the hclust method
Similar Question without any answer on the R help list
Upvotes: 3
Views: 1622
Reputation: 14093
How would a distance matrix be sparse? There is a distance between each two objects, so it is actually a very dense matrix. However, a triangular matrix is sufficient to describe the mutual distances (as D = D'). This is actually the case for the objects produced by dist
.
If the distance matrix is sparse because lots of objects are the same, then maybe you'd want to calculate the distance matrix only on unique objects.
Upvotes: -5