Reputation: 75
I am trying to extract the noun phrases from sentences. I am using opennlp librari "en-parser-chunking.bin".
code example:
ArrayList<opennlp.tools.parser.Parse> nounPhrases = new ArrayList<>();
searchmethod("what is the nickname of the British flag?");
for(int t =0; t<50; t++)
{
str= text.get(t);
InputStream is = new FileInputStream("en-parser-chunking.bin");
ParserModel model = new ParserModel(is);
opennlp.tools.parser.Parser parser = ParserFactory.create(model);
opennlp.tools.parser.Parse[] topParses = ParserTool.parseLine(str, parser, 1);
for (opennlp.tools.parser.Parse p : topParses){
p.show();
if (p.getType().equals("NP")) {
nounPhrases.add(p);
}
}
}
With this code i get the following result:
(TOP (S (NP (NP (DT The) (NN nickname)) (PP (IN for) (NP (DT the) (JJ British) (NN flag)))) (VP (VBZ is) (NP (NP (DT the) (NNP Union) (NNP Jack.)) (SBAR (IN Although) (S (NP (PRP it)) (VP (VBZ is) (ADVP (RB only) (RB correctly)) (VP (VBN known) (PP (IN as) (NP (DT this) (NN when) (NN flown))) (PP (IN on) (NP (DT a) (NN ship.)))))))))))
How can i extract from that result the noun phrases?
Any help would be greatly appreciated.
Upvotes: 4
Views: 1674
Reputation: 1
Hi I agree with the answer but if you see your output closely there is a problem in the identified tree which will cause wrong chunk detection by the tree.
In the above example there is a PP identified as which is wrong as flown can never be a NN. What I believe is that right postagging is the key. Please let me know if you need to know how postagging can be corrected. Thanks.
(PP
(IN as)
(NP
(DT this) (NN when) (NN flown)
)
)
)
Upvotes: 0
Reputation: 1654
You could extract the NP
s from that, but there's a model at http://opennlp.sourceforge.net/models-1.5/en-chunker.bin that does just chunking (i.e. noun phrase detection), without grammar. This might be easier to use (but it requires tokenizing and POS tagging steps before it can run).
Upvotes: 1