Row/column names from a correlation matrix values in Spark

Question

I calculated a correlation matrix in spark and I want to extract single correlations in combination with their column names.

Correlation Matrix

correlMatrix: org.apache.spark.mllib.linalg.Matrix = 
1.0                   -0.33333333333333254  -0.8164965809277261  -0.7777777777777787   
-0.33333333333333254  1.0                   0.8164965809277356   -0.33333333333333254  
-0.8164965809277261   0.8164965809277356    1.0                  0.27216552697591645   
-0.7777777777777787   -0.33333333333333254  0.27216552697591645  1.0

Datafarme Names

colNames: Array[String] = Array(item_1, item_2, item_3, item_4)

Now I want to extract each combination into dataframe with the following structure:

item_from | item_to | Correlation
item_1    | item_2  | -0.0096912
item_1    | item_3  | -0.7313071
item_2    | item_3  | 0.68910356

Or at least the whole correlation matrix with column names:

           item_1                item_2                item_3          item_4
item_1     1.0                   -0.33333333333333254  -0.8164965809277261  -0.7777777777777787   
item_2     -0.33333333333333254  1.0                   0.8164965809277356   -0.33333333333333254  
item_3     -0.8164965809277261   0.8164965809277356    1.0                  0.27216552697591645   
item_4     -0.7777777777777787   -0.33333333333333254  0.27216552697591645  1.0

I've tried to write a map function but it didn't work as I expected.

Is there any solution you could suggest?

Row/column names from a correlation matrix values in Spark

Answers (1)

Related Questions