Reputation: 89
I'm doing a project. I need any opensource tool or technique to find the semantic similarity of two sentences, where I give two sentences as an input, and receive score (i.e.,semantic similarity) as an output. Any help?
Upvotes: 8
Views: 13935
Reputation: 119
You can try using the UMBC Semantic Similarity Service which is based on WordNet KB. There are UMBC STS (Semantic Textual Similarity) Service. Here is the link http://swoogle.umbc.edu/StsService/sts.html
Regards,
Upvotes: 0
Reputation: 1469
Salma, I'm afraid this is not the right forum for your question as it's not directly related to programming. I recommend that you ask your question again on corpora list. You also may want to search their archives first.
Apart from that, your question is not precise enough, and I'll explain what I mean by that. I assume that your project is about computing the semantic similarity between sentences and not about something else to which semantic similarity is just one thing among many. If this is the case, then there are a few things to consider: First of all, neither from the perspective of computational linguistics nor of theoretical linguistics is it clear what the term 'semantic similarity' means exactly. There are numerous different views and definitions of it, all depending on the type of problem to be solved, the tools and techniques which are at hand, and the background of the one approaching this task, etc. Consider these examples:
Which of the sentences 2-4 are similar to 1? 2 is the exact opposite of 1, still it is about Pete and Rob (not) finding a dog. 3 is about Pete and Rob, but in a completely different context. 4 is about find a dog near the station, although the finder being someone else. 5 is about Pete, Rob, a dog, and a 'finding' event but in a different way than in 1. As for me, I would not be able to rank these examples according to their similarity even without having to write a computer program.
In order to compute semantic similarity you need to first decide what you want to be treated as 'semantically similar' and what not. In order to compute semantic similarity on the sentence level, you ideally would compare some kind of meaning representation of the sentences. Meaning representation normally come as logic formula and are extremely complex to generate. However, there are tools which attempt to do this, e.g. Boxer
As a simplistic but often practical approach, you would define semantic similarity as the sum of the similarities between the words in one sentence and the other. This makes the problem a lot easier, although there are still some difficult issues to be addressed since semantic similarity of words is just as badly defined as that of sentences. If you want to get an impression of this, take a look into the book 'Lexical Semantics' by D.A. Cruse (1986). However, there are quite a number of tools and techniques to compute the semantic similarity between word. Some of them define it basically as the negative distance of two words in a taxonomy like Word Net or the Wikipedia taxonomy (see this paper which describes an API for this). Others compute semantic similarity by using some statistical measures calculated over large text corpora. They are based on the insight that similar words occur in similar context. A third approach to calculating semantic similarity between sentences or words is concerned with vector space models which you may know from information retrieval. To get an overview about these latter techniques, take a look at chapter 8.5 in the book Foundations of statistical natural language processing by Manning and Schütze.
Hope this gets you off on your feet for now.
Upvotes: 19
Reputation: 99
I have developed a simple open-source tool that does the semantic comparison according to categories: https://sourceforge.net/projects/semantics/files/
It works with sentences of any length, is simple, stable, fast, small in size...
Here is a sample output:
Similarity between the sentences
-Pete and Rob have found a dog near the station.
-Pete and Rob have never found a dog near the station.
is: 1.0000000000
Similarity between the sentences
-Patricia found a dog near the station.
-It was a dog who found Pete and Rob under the snow.
is: 0.7363210405107239
Similarity between the sentences
-Patricia found a dog near the station.
-I am fine, thanks!
is: 0.0
Similarity between the sentences
-Hello there, how are you?
-I am fine, thanks!
is: 0.29160592175990213
USAGE:
import semantics.Compare;
public class USAGE {
public static void main(String[] args) {
String a = "This is a first sentence.";
String b = "This is a second one.";
Compare c = new Compare(a,b);
System.out.println("Similarity between the sentences\n-"+a+"\n-"+b+"\n is: " + c.getResult());
}
}
Upvotes: 9