neel
neel

Reputation: 9061

Word Sense Disambiguation on Selected words

I have a given set of keywords, which I know are only related to my application. But these keywords can have different meaning at different contexts. Only one meaning is useful to me, which I know in advance. How can I disambiguate their meanings at runtime?
I tried using different Word Sense Disambiguation methods in the market but these are giving poor results?
Can anyone help me here?

Upvotes: 0

Views: 306

Answers (2)

Nikita Astrakhantsev
Nikita Astrakhantsev

Reputation: 4749

Disambiguation is the task of choosing one meaning from pre-specified set for the term (word/collocation, or keyword) depending on the context. The main idea here is to compute similarity between each meaning and the context, and then choose the closest meaning. Also it is very useful to have a priori distribution over meanings - for example, how often each meaning was used for the term; most common sense algorithm is a pretty good baseline, by the way.

So, you task is to set a priori distribution, to define similarity measure, and to choose the context. Often it is enough to consider only local context - 3 to 5 closest words from each side. Similarity measure highly depends on your dictionary (set of meanings per term) and your domain. One example - cosine over tf-idf vectors - is proposed above.

Having this, you can create a binary classifier; ideally, to train machine learning one like Logistic regression, if you have train set where you know precisely for each keyword if it has useful meaning or not. If you have only positive examples (which seems to be assumed by user1981700), then you have something like one class classifiction, which usually has worse performance.

Hope this helps. If you provide with more details about your domain and kind of dictionary, it will be much easier to think out more appropriate solution.

Upvotes: 2

Eric B.
Eric B.

Reputation: 31

Word sense disambiguation is an open problem, so the success of any approach will depend a lot on your particular data. If you have enough context surrounding your keywords that are supplied at runtime, you could calculate the tf-idf (http://en.wikipedia.org/wiki/Tf%E2%80%93idf) and compare it to a pre-established tf-idf of the word sense you are interested in: of course, this means having training data where only the sense you're interested in occurs. You could then compare the two tf-idf vectors, and if they are similar (http://en.wikipedia.org/wiki/Cosine_similarity) enough according to some threshold that you could establish experimentally, then you could conclude they are the same sense. Good luck.

Upvotes: 2

Related Questions