Reputation: 69

Give more memory to stanford parser

In the script lexparser.sh, the stanford parser runs with the command

java -mx150m -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \
 -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz $*

However, when parsing a sentence with 59 words in it, I get the error

* WARNING!! OUT OF MEMORY! THERE WAS NOT ENOUGH MEMORY TO RUN ALL PARSERS. EITHER GIVE THE JVM MORE MEMORY, SET THE MAXIMUM SENTENCE LENGTH WITH -maxLength, OR PERHAPS YOU ARE HAPPY TO HAVE THE PARSER FALL BACK TO USING A SIMPLER PARSER FOR VERY LONG SENTENCES. *

According to the FAQ, 350mb should be enough to parse a 100-word sentence. But when I change -mx150m to -mx350m (or -mx1000m), I get the same memory issue. This makes me think that I'm not actually assigning more memory to the program. What can I do to test how much memory I'm assigning, and actually assign more?

Upvotes: 1

Answers (2)

Igor

Reputation: 1281

I found the following line:

nltk.internals.config_java(options='-xmx4G')

in this thread: How to parse large data with nltk stanford pos tagger in Python But it didn't resolve my OSErrors. The error I got started with

OSError: Java command failed : ['/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java', '-mx1000m',

leading me to believe that it still has just 1G of memory assigned. If anyone has found a solution to this, I'd be very interested to learn about it.

Upvotes: 0

Samiulla Shaikh

Reputation: 57

The correct way to specify the max heap size to Java is:

java -Xmx1g .....

Not sure why they have mentioned only -mx on the FAQ page instead of -Xmx.

Upvotes: 0

Give more memory to stanford parser

Answers (2)

Related Questions