Reputation: 1329
We need to multiply a large matrix with a one-dimensional vector. The large matrix is sparse. In a second scenario, we need to multiply two large matrices, both of which are sparse. And in the third scenario, we need to multiply two large matrices both of which are dense.
Apache Spark seems to provide a built-in data type for matrices (including a specialized one for sparse matrices) as well as what seems to be a very rich set of libraries for matrix linear algebra (multiplication, addition, transposition, etc.)
How can one efficiently do the matrix multiplications (or other linear algebra operations for matrixes) on Google Cloud DataFlow for the three scenarios described above?
Upvotes: 1
Views: 717
Reputation: 6130
Dataflow currently doesn't support matrix operations natively. That said, it should be possible to implement these operations similarly to spark.
For sparse matrices, it should be possible to key by the (x,y)
coordinate, and then do a GroupByKey
.
For dense matrices, you can divide the matrix into blocks, use a GroupByKey
to group the blocks, and then use a native library (such as BLAS) to implement the multiplication on the blocks.
See BlockMatrix for more information on how the block operations are implemented in Spark.
Upvotes: 3