Reputation: 93
I am new when it comes to NLP. Overall what I am trying to do is: given two sentences A and B, I want to figure out which words from B are completely semantically different from words in A. Essentially I need to calculate the similarity between two sentences and figure out which words (from B) have low similarity and print them. I computed the cosine similarity and it doesn't give much info about the matrix similarity.
lets say A="Lung cancer is a malignant lung tumour " and B = "Lung cancer is a lung disease",
since disease and tumour are semantically similar, the word(s) with small similarity score in A would be 'malignant' as it doesn't match with any word in B
How can I do that? Maybe I am looking at this completely wrong. But I need to find the words in A that are not in B and take into consideration semantically similar words.
Upvotes: 1
Views: 275
Reputation: 101
One way that I can think of is that you can split both of your sentences into words and then use something like Wordnet to compare every word in one sentence to every other word and define a threshold value so if a word does not have a similarity score greater than some threshold with any word in the other sentence then it probably is an outlier but again this approach seems a bit primitive and I'd love to read what others have to suggest. A good place to start exploring similarity between words can be this
Upvotes: 0