Reputation: 61
I have spark job to compute the similarity between text documents:
RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());
CoordinateMatrix rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();
List<MatrixEntry> list = entries.collect();
for(MatrixEntry s : list) System.out.println(s);
the MatrixEntry(i, j, value) represents the similarity between columns(let's say the features of documents).But how can I show the similarity between rows? suppose I have five documents Doc1,.... Doc5, We would like to show the similarity between all those documnts. How do we get that? any help?
Upvotes: 4
Views: 824
Reputation: 688
You have to transpose your matrix: You should start from RowMaxtrix -> IndexedRowMatrix -> BlockMatrix -> transpose -> BlockMatrix -> IndexedRowMatrix -> RowMatrix
IndexedRowMatrix rowMatrix = new IndexedRowMatrix(vectorsRDD.rdd());
CoordinateMatrix rowsimilarity= rowMatrix.toBlockMatrix().transpose().toIndexedRowMatrix().toRowMatrix().columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();
List<MatrixEntry> list = entries.collect();
for(MatrixEntry s : list) System.out.println(s);
Upvotes: 2