algorithm to get topic / focus of sentence out of words in sentence

Question

Are there any well-know or successful algorithms for obtaining the topic and / or focus of a sentence ( question ) out of the words in the sentence question?

If not, how would I got about getting the topic / focus of the question. It seems that the topic / focus of the questions is usually a noun or a noun-phrase.

So the first thing I would do is determine the nouns by Part Of Speech tagging the question. but then how do I know if I should get just the nouns or the noun(s) and a adjective before it, or the noun and the adverb before it, or the noun(s) and verb?

For example:

In ' did the quick brown fox jump over the lazy dog ', get ' quick brown fox ', ' jump ', and ' lazy dog '.

In ' what is the population of japan ', get ' population ' and ' japan '

In ' what color is milk ' get ' color ' and ' milk '

In ' What is the height of Mt. Everest ' get ' Mt. Everst ' and ' Height '.

While writing these I guess the easiest way is removing stop words.

CTsiddharth · Accepted Answer

This could be thought of as a parsing problem and I personally find the stanford nlp tool very effective .

Here is the link to the demo of the stanford parser

For the example , did the quick brown fox jump over the lazy dog The output you get is

did/VBD
the/DT
quick/JJ
brown/JJ
fox/NN
jump/VB
over/RP
the/DT
lazy/JJ
dog/NN

From the output you can write an extractor to extract the nouns ( adjectives and adverbs if need be) and thus obtain the topics from the sentence .

Moreover , the parse tree looks like

(ROOT
  (SINV (VBD did)
    (NP (DT the) (JJ quick) (JJ brown) (NN fox))
    (VP (VB jump)
      (PRT (RP over))
      (NP (DT the) (JJ lazy) (NN dog)))))

If you take a closer look at the parse tree , the output you are expecting are both the NP(noun phrases) - the quick brown fox and the lazy dog .

I hope this helps !

algorithm to get topic / focus of sentence out of words in sentence

Answers (2)

Related Questions