Spark Clustering: How to get a similarity measure of the elements within the same cluster?

Question

I have clustered some data using Spark and now I want to get a similarity score between a specific entry I am interested in and the other elements in the same cluster my entry is in. Are there any Spark algorithms or methods for this?

I've read of the ColumnSimilarities() function for RowMatrix but I am not interested in all-vs-all similarity, just a very specific one against the set of other vectors.

gsamaras · Accepted Answer

It seems like there is no such built-in functionality in Spark. You could use ColumnSimilarities(), and then the results in index i and j correspond to items i and j.

However, that's obviously inefficient and it doesn't feel good as well to be honest.

So if I were you, I would look the implementation of ColumnSimilarities() and adjust it for item-pair similarity; if it's good you could contribute to the Apache Spark project as well with that! ;)

Spark Clustering: How to get a similarity measure of the elements within the same cluster?

Answers (1)

Related Questions