Reputation: 69
In the script lexparser.sh, the stanford parser runs with the command
java -mx150m -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \
-outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz $*
However, when parsing a sentence with 59 words in it, I get the error
* WARNING!! OUT OF MEMORY! THERE WAS NOT ENOUGH MEMORY TO RUN ALL PARSERS. EITHER GIVE THE JVM MORE MEMORY, SET THE MAXIMUM SENTENCE LENGTH WITH -maxLength, OR PERHAPS YOU ARE HAPPY TO HAVE THE PARSER FALL BACK TO USING A SIMPLER PARSER FOR VERY LONG SENTENCES. *
According to the FAQ, 350mb should be enough to parse a 100-word sentence. But when I change -mx150m to -mx350m (or -mx1000m), I get the same memory issue. This makes me think that I'm not actually assigning more memory to the program. What can I do to test how much memory I'm assigning, and actually assign more?
Upvotes: 1
Views: 454
Reputation: 1281
I found the following line:
nltk.internals.config_java(options='-xmx4G')
in this thread: How to parse large data with nltk stanford pos tagger in Python But it didn't resolve my OSErrors. The error I got started with
OSError: Java command failed : ['/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java', '-mx1000m',
leading me to believe that it still has just 1G of memory assigned. If anyone has found a solution to this, I'd be very interested to learn about it.
Upvotes: 0
Reputation: 57
The correct way to specify the max heap size to Java is:
java -Xmx1g .....
Not sure why they have mentioned only -mx
on the FAQ page instead of -Xmx
.
Upvotes: 0