Reputation: 1432
I have the following format in a class called Extract. This class has a main method that is supposed to read lines from a text file and output them to another text file. This is the overview:
Public class Extract{
public static void main(String[] args){
try{
Scanner br = new Scanner(new FileInputStream(file_from)); // read from file
Printwriter out = new PrintWriter ( new BufferedWriter (file_to))); // File to write to
while(br.hasNextLine())
{
ArrayList<String> sentences = new ArrayList<String>();
String some_sentence;
for (int i = 0 ; i < 1000 ; i++)
{
some_sentence = br.nextLine();
if (some_sentence != null){
sentence.add(some_sentence);
}
}
for (int i = 0 ; i < sentences.size() ; i++)
{
some_sentence = sentences.get(i);
// prepare sentence to be parsed
Tree parsed = lp.parse(some_sentence);
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parsed);
Collection<TypedDependency> tdl = gs.typedDependencies();
Iterator<TypedDependency> itr = tdl.iterator();
out.println(some_sentence);
out.println("\n");
System.out.println(++count);
while (itr.hasNext()) {
TypedDependency temp = itr.next();
out.println(temp);
}
}
}
} catch (Exception e) {
System.out.println("Something failed");
e.printStackTrace();
}
}
}
Given that I am initializing the array sentences
with 1000 new strings in each iteration of the while loop, could this cause my program to stop running? My program is exiting with the following error message:
NOT ENOUGH MEMORY TO PARSE SENTENCES OF LENGTH 500
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.createArrays(ExhaustivePCFGParser.java:2203)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.considerCreatingArrays(ExhaustivePCFGParser.java:2173)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:346)
at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.parseInternal(LexicalizedParserQuery.java:238)
at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.parse(LexicalizedParserQuery.java:530)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:301)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:279)
at Pubmedparse2.main(Pubmedparse2.java:52
could it be that garbage collection is not working properly in the case where I release 1000 dead String objects in every iteration of the while loop? (As a note the packages listed in the error are a package used to parse sentences into grammatical relationships).
Thanks for any help.
Upvotes: 0
Views: 368
Reputation: 328774
While the garbage collector is a mysterious beast that pounces even experienced developers in unexpected ways, I doubt that it's involved in this case (unless your input file is bigger than 10 MB).
My guess is that you made a mistake in the code. Use the real debugger or poor man's debugging to see what the code is really going.
[EDIT] It seems you're using the Stanford NLP parser.
From the documentation:
at least 100MB to run as a PCFG parser on sentences up to 40 words in length; typically around 500MB of memory to be able to parse similarly long typical-of-newswire sentences using the factored model
Check the documentation of your Java VM to find out how to give it more memory.
Upvotes: 2