Reputation: 105
I am trying to use cosine distance class of apache commons. But it always returns 1.0. Am I missing someting? Here is my code:
public class ComputeDistance {
public static void main(String[] args)throws Exception{
CosineDistance dist = new CosineDistance();
CharSequence c1 = "example text1";
CharSequence c2 = "another file";
System.out.println(dist.apply(c1,c2));
}
}
Upvotes: 2
Views: 2434
Reputation: 164
The CosineDistance
returns 1 - cosineSimilarity(leftVector, rightVector)
. leftVector
and rightVector
are maps of words and the counts of occurrence in the char sequence, so the result of cosineSimilarity(leftVector, rightVector) = 0
. You can change your code to use the characters of yours char sequence instead of the words:
public class ComputeDistance {
public static void main(String[] args) throws Exception {
CosineSimilarity dist = new CosineSimilarity();
String c1 = "example text1";
String c2 = "another file";
Map<CharSequence, Integer> leftVector =
Arrays.stream(c1.split(""))
.collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));
Map<CharSequence, Integer> rightVector =
Arrays.stream(c2.split(""))
.collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));
System.out.println(1 - dist.cosineSimilarity(leftVector,rightVector));
}
}
Upvotes: 1