Xiaoyu Chen
Xiaoyu Chen

Reputation: 335

Converting CoordinateMatrix to Array?

I created a CoordinateMatrix:

import org.apache.spark.mllib.linalg.distributed.{
  CoordinateMatrix, MatrixEntry}
val entries = sc.parallelize(Seq(
  MatrixEntry(0, 1, 1), MatrixEntry(0, 2, 2), MatrixEntry(0, 3, 3), 
  MatrixEntry(0, 4, 4), MatrixEntry(2, 3, 5), MatrixEntry(2, 4, 6),
  MatrixEntry(3, 4, 7)))
val mat: CoordinateMatrix = new CoordinateMatrix(entries)

which is

0 1 2 3 4
0 0 0 0 0
0 0 0 5 6
0 0 0 0 7

And then I want to print this matrix. I first convert it to IndexedRowMatrix (order of rows is important for me and I cannot lose any row in the matrix):

scala> mat.toIndexedRowMatrix.rows.collect.sortBy(_.index)
res8: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = 
    Array(IndexedRow(0,(5,[1,2,3,4],[1.0,2.0,3.0,4.0])), IndexedRow(2,(5,[3,4],[5.0,6.0])), IndexedRow(3,(5,[4],[7.0])))

But in this result the second row is dropped because all the entries are 0. So I cannot go further to print the matrix (or convert the matrix to Array[Array[Double]]). I don't know how to deal with this, thank you.

Upvotes: 1

Views: 922

Answers (1)

zero323
zero323

Reputation: 330413

In general if you need a distributed matrix then collecting and printing is simply not an option. Still you can covert your data to BlockMatrix and collect as a local DenseMatrix as follows:

mat.toBlockMatrix.toLocalMatrix
// res1: org.apache.spark.mllib.linalg.Matrix = 
// 0.0  1.0  2.0  3.0  4.0  
// 0.0  0.0  0.0  0.0  0.0  
// 0.0  0.0  0.0  5.0  6.0  
// 0.0  0.0  0.0  0.0  7.0

Upvotes: 2

Related Questions