PRIYA M
PRIYA M

Reputation: 181

Convert org.apache.spark.mllib.linalg.Matrix to spark dataframe in Scala

I have an input dataframe input_df as:

+---------------+--------------------+
|Main_CustomerID|              Vector|
+---------------+--------------------+
|         725153|[3.0,2.0,6.0,0.0,9.0|
|         873008|[4.0,1.0,0.0,1.0,...|
|         625109|[1.0,0.0,6.0,1.0,...|
|         817171|[0.0,4.0,0.0,7.0,...|
|         611498|[1.0,0.0,4.0,5.0,...|
+---------------+--------------------+

The input_df is of schema type,

root
 |-- Main_CustomerID: integer (nullable = true)
 |-- Vector: vector (nullable = true)

By referring to Calculate Cosine Similarity Spark Dataframe, I have created the indexed row matrix and then I do:

val lm = irm.toIndexedRowMatrix.toBlockMatrix.toLocalMatrix 

to find cosine similarity between columns. Now I have a resultant mllib matrix,

cosineSimilarity: org.apache.spark.mllib.linalg.Matrix =
0.0  0.4199605255658081  0.5744269579035528  0.22075539284417395  0.561434614044346
0.0  0.0                 0.2791452631195413  0.7259079527665503   0.6206918387272496
0.0  0.0                 0.0                 0.31792539222893695  0.6997167152675132
0.0  0.0                 0.0                 0.0                  0.6776404124278828
0.0  0.0                 0.0                 0.0                  0.0

Now, I need to convert this lm which is of type org.apache.spark.mllib.linalg.Matrix into a dataframe. I expect my output dataframe to look as follows:

+---+------------------+------------------+-------------------+------------------+
| _1|                _2|                _3|                 _4|                _5|
+---+------------------+------------------+-------------------+------------------+
|0.0|0.4199605255658081|0.5744269579035528|0.22075539284417395| 0.561434614044346|
|0.0|               0.0|0.2791452631195413| 0.7259079527665503|0.6206918387272496|
|0.0|               0.0|               0.0|0.31792539222893695|0.6997167152675132|
|0.0|               0.0|               0.0|                0.0|0.6776404124278828|
|0.0|               0.0|               0.0|                0.0|               0.0|
+---+------------------+------------------+-------------------+------------------+

How can I do this in Scala?

Upvotes: 1

Views: 984

Answers (1)

Shaido
Shaido

Reputation: 28322

To convert the Matrix to a dataframe as specified, do the following. It first converts the matrix to a dataframe containing a single column with an array. Then foldLeft is used to break the array into separate columns.

import spark.implicits._
val cols = (0 until lm.numCols).toSeq

val df = lm.transpose
  .colIter.toSeq
  .map(_.toArray)
  .toDF("arr")

val df2 = cols.foldLeft(df)((df, i) => df.withColumn("_" + (i+1), $"arr"(i)))
  .drop("arr")

Upvotes: 2

Related Questions