Reputation: 1192
I have clustered some data using Spark and now I want to get a similarity score between a specific entry I am interested in and the other elements in the same cluster my entry is in. Are there any Spark algorithms or methods for this?
I've read of the ColumnSimilarities() function for RowMatrix but I am not interested in all-vs-all similarity, just a very specific one against the set of other vectors.
Upvotes: 1
Views: 585
Reputation: 73366
It seems like there is no such built-in functionality in Spark. You could use ColumnSimilarities()
, and then the results in index i and j correspond to items i and j.
However, that's obviously inefficient and it doesn't feel good as well to be honest.
So if I were you, I would look the implementation of ColumnSimilarities()
and adjust it for item-pair similarity; if it's good you could contribute to the Apache Spark project as well with that! ;)
Upvotes: 1