Reputation: 217
I am a newbie to Stanford NLP.I am using lexicalized parser for parsing the contents of the file and extracting the noun phrases.While parsing the line it is taking more time for generating a tree structure.
I am using a Tregex pattern to get noun phrases from a line.
I am using 1 MB file to parse,so it is taking, more than two hours for parsing as well as for extracting the noun phrases.
Here is my full code that i am using.
Tree x = parser.apply(line);
System.out.println("tree s=="+x);
TregexPattern NPpattern = TregexPattern.compile("@NP <@/NN.?/");
TregexMatcher matcher = NPpattern.matcher(x);
while (matcher.findNextMatchingNode()) {
Tree match = matcher.getMatch();
List<TaggedWord> tWord = match.taggedYield();
Iterator<TaggedWord> it = tWord.iterator();
String str="";
while(it.hasNext()){
TaggedWord word = it.next();
String taggedWord = word.tag();
if(taggedWord.equals("NN")||taggedWord.equals("NNS")||taggedWord.equals("NNP")){
str = str+word.value()+" ";
}
}
}
So please help me how to increase the performance or is there another way to optimize this code.
Thanks in advance Gouse.
Upvotes: 0
Views: 639
Reputation: 9450
Full constituency parsing of text is just kind of slow.... If you stick with it, there may not be much that you can do.
But a couple of things to mention: (i) If you're not using the englishPCFG.ser.gz grammar, then you should, because it's faster than using englishFactored.seer.gz and (ii) Parsing very long sentences is especially slow, so if you can get by omitting or breaking very long sentences (say, over70 words), that can help a lot. In particular, if some of the text is from web scraping or whatever and has long lists of stuff that aren't really sentences, filtering or dividing them may help a lot.
The other direction you could go is that you appear to not really need a full parser but just an NP chunker (something that identifies minimal noun phrases in a text). These can be much faster as they don't build recursive structure. There isn't one at present among the Stanford NLP tools, but you can find some by searching for this term on the web.
Upvotes: 1