Jonathan Sylvester
Jonathan Sylvester

Reputation: 1329

how to efficiently do large matrix multiplications on Google cloud data flow?

We need to multiply a large matrix with a one-dimensional vector. The large matrix is sparse. In a second scenario, we need to multiply two large matrices, both of which are sparse. And in the third scenario, we need to multiply two large matrices both of which are dense.

Apache Spark seems to provide a built-in data type for matrices (including a specialized one for sparse matrices) as well as what seems to be a very rich set of libraries for matrix linear algebra (multiplication, addition, transposition, etc.)

How can one efficiently do the matrix multiplications (or other linear algebra operations for matrixes) on Google Cloud DataFlow for the three scenarios described above?

Upvotes: 1

Views: 717

Answers (2)

Narek
Narek

Reputation: 616

The following method should work on dataflow.

Upvotes: 0

Ben Chambers
Ben Chambers

Reputation: 6130

Dataflow currently doesn't support matrix operations natively. That said, it should be possible to implement these operations similarly to spark.

For sparse matrices, it should be possible to key by the (x,y) coordinate, and then do a GroupByKey.

For dense matrices, you can divide the matrix into blocks, use a GroupByKey to group the blocks, and then use a native library (such as BLAS) to implement the multiplication on the blocks.

See BlockMatrix for more information on how the block operations are implemented in Spark.

Upvotes: 3

Related Questions