Oliver
Oliver

Reputation: 1885

proximity matrix in python

What is the best way to compute the distance/proximity matrix for very large sparse vectors? For example you are given the following design matrix, where each row is 68771 dimensional sparse vector.

designMatrix <5830x68771 sparse matrix of type '' with 1229041 stored elements in Compressed Sparse Row format>

Upvotes: 1

Views: 1500

Answers (1)

JoshAdel
JoshAdel

Reputation: 68682

Have you tried the routines in scipy.spatial.distance?

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

If this forces you to go to a dense representation, then you may be better off rolling your own, depending on the density of nonzero elements. You could squeeze out the zeros while retaining a map between the new and original indices, calculate the pairwise distances on the remaining nonzero elements and then use the indexing to map things back.

Upvotes: 1

Related Questions