smprj
smprj

Reputation: 63

Why ws4j online demo values and source code demo values differ, especially the lesk value?

I am trying to find the similarity between two words (for example "home" and "house") using lesk.

I executed the demo code for finding lesk value given here and I also found the value using online ws4j demo here

Both of them give different values i.e.

Values by executing demo code given in ws4j :
WuPalmer = 0.4
JiangConrath = 0.08467941109843881
LeacockChodorow = 1.1349799328389845
Lin = 0.16528546101187536
Resnik = 1.1692001183611416
Path = 0.1111111111111111
Lesk = 0.0
HirstStOnge = 0.0


Values by online demo:
wup( home#n#8 , house#n#10 ) = 1.0000
jcn( home#n#8 , house#n#10 ) = 12876699.5
lch( home#n#8 , house#n#10 ) = 3.6889
lin( home#n#8 , house#n#10 ) = 1.0000
res( home#v#1 , house#v#2 ) = 9.0735
path( home#n#8 , house#n#10 ) = 1.0000
lesk( home#n#8 , house#n#10 ) = 1571
hso( home#n#8 , house#n#10 ) = 16

Why is so huge difference between these two when they both use same ws4j?? Is there any problem with the demo code ??

Upvotes: 2

Views: 1046

Answers (3)

user3503711
user3503711

Reputation: 2066

Home and House, both are in the same synset. So for wup and jcn, the value seems right. Which version of JDK do you use ? Try this link - http://maraca.d.umn.edu/cgi-bin/similarity/similarity.cgi?word1=home&senses1=all&word2=house&senses2=all&measure=wup&rootnode=yes

It'll also give you the same result.

Use home#n#1 and house#n#1 in online version, it will give the result like your compiler.

Upvotes: 0

Xing Hu
Xing Hu

Reputation: 128

For one thing, ws4j does show inconsistency between its online demo and the last stable release (v1.0.1). You could find related issue at here.

However, for your case, it is because the "mfs" flag (which stands for the Most Frequent Sense) is set to true at default in the ws4j library. When this flag is true, the similarity calculation will only perform on the most frequent senses of each word; when it is false, similarity calculation will be computed on all sense combination. Basically it is equal to @Pranav 's answer.

It is expectable that the computation burden will be greatly increased when mfs is set to false. So I guess that's the reason the author set it to true as default.

If you want to set the mfs value to false in your code, simply use:

WS4JConfiguration.getInstance().setMFS(false);

Upvotes: 0

Pranav
Pranav

Reputation: 11

String word1="house";
String word2="home";
RelatednessCalculator wup = new WuPalmer(db);

List<POS[]> posPairs = wup.getPOSPairs();

double maxScore = -1D;
for(POS[] posPair: posPairs) {
List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posPair[0].toString());
List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posPair[1].toString());

for(Concept ss1: synsets1) 
{
    for (Concept ss2: synsets2) {

        Relatedness relatedness = wup.calcRelatednessOfSynset(ss1, ss2);
        double score = relatedness.getScore();
        if (score > maxScore) { 
                 maxScore = score;
        }
         p1=ss1.getPos().toString();
         p2=ss2.getPos().toString();
    }
}} if (maxScore == -1D) {
maxScore = 0.0;}
System.out.println("sim('" + word1 +" "+ p1 +"', '" + word2 +" "+ p2+ "') =  " + maxScore);

Upvotes: 1

Related Questions