Aaron Clifton
Aaron Clifton

Reputation: 105

apache.commons.text cosine distance

I am trying to use cosine distance class of apache commons. But it always returns 1.0. Am I missing someting? Here is my code:

public class ComputeDistance {
    public static void main(String[] args)throws Exception{

        CosineDistance dist = new CosineDistance();
        CharSequence c1 = "example text1";
        CharSequence c2 = "another file";
        System.out.println(dist.apply(c1,c2));
    }
}

Upvotes: 2

Views: 2434

Answers (1)

Nahuel
Nahuel

Reputation: 164

The CosineDistance returns 1 - cosineSimilarity(leftVector, rightVector). leftVector and rightVector are maps of words and the counts of occurrence in the char sequence, so the result of cosineSimilarity(leftVector, rightVector) = 0. You can change your code to use the characters of yours char sequence instead of the words:

public class ComputeDistance {
  public static void main(String[] args) throws Exception {

    CosineSimilarity dist = new CosineSimilarity();

    String c1 = "example text1";
    String c2 = "another file";

    Map<CharSequence, Integer> leftVector =
        Arrays.stream(c1.split(""))
        .collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));
    Map<CharSequence, Integer> rightVector =
        Arrays.stream(c2.split(""))
        .collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));

    System.out.println(1 - dist.cosineSimilarity(leftVector,rightVector));

  }
}

Upvotes: 1

Related Questions