JnBrymn
JnBrymn

Reputation: 25373

How to deal with a giant sparse matrices?

Someone point me in the right direction. I'm looking to do some heavy-duty manipulation of some really large and often very sparse matrices and I'm looking for the right tool for the job. These matrices will be much, much larger than the RAM of any single machine and will therefore likely be spread to several different machines. The matrices will often be sparse. I will want to perform all of the common matrix operations: multiplication, transpose, inverse, pseudo-inverse, SVD, Eigenvalue Decomposition, etc. Probably key among my concerns is that since the matrices will very likely be spread among several machines, I will want to minimize information sharing, because network latency is probably my biggest enemy. I'm concerned that map-reduce (a la Hadoop) is not the right option because it's focus is upon streaming large amounts of data between machines. This book provides a great intro to map-reduce from an algorithmic perspective. And lots of matrix operations are akin to giant JOIN operations which are known to be slow or map-reduce.

So... where should I go?

Upvotes: 1

Views: 891

Answers (1)

Aravind Yarram
Aravind Yarram

Reputation: 80194

This paper: Design of Hadoop-based Large-Scale Matrix Computations can help you on the implementation guidelines. HBase is meant for storing sparse tables so HBase might be the recommended storage option of the Matrices.

Upvotes: 1

Related Questions