Reputation: 45
I am using Stanford NER in my web application and english.muc.7class.distsim.crf.ser.gz (16 MB size)as a classifier.When I try to deploy and run my application am getting a heap space - Out of Memory error while loading the classifier.
Have tried keeping the useful code only also checked if the code is not creating too many objects and occupying the space. But no success.
Is it because of the size of the classifier? But i want to use the same so what should I do?
Have increased the heap size on local using vm options in tomcat.But I can increase the heap size of vm on the actual server where I will host my application and that's not the right way either.
Can anyone guide me about this?
Upvotes: 1
Views: 620
Reputation: 5642
I agree with Christopher suggestions, you don't worry about the size.
But for robust performance try to use Java thread that is live for ever and load the classifier only once at the start of serevr via static method or some listener. Then for further annotation use the same context.
Upvotes: 0
Reputation: 9450
Yes, you basically shouldn't worry much about the size of the code, since it is dominated by the size of the data loaded.
Model data: The classifier models just take a lot of space. It seems like you need a heap of about 140 MB to load the current (2012) version of english.muc.7class.distsim.crf.ser.gz . They're just a lot of Strings and doubles, but there's a big increase from the size on disk because: the on disk data is compressed, as is well known, String objects in java each take a huge amount of space, and they're linked via a HashMap which takes more space. It seems like the String data alone ends up taking about 72 MB in memory (36 MB of char[] data, 36 MB of String objects).
Data to be analyzed: This depends on how you're calling it and may not be a problem in your case with tomcat, but if NER is run on a file, it will read the whole file into memory before classifying. So you can reduce memory by giving it multiple smaller units (files, Strings, or whatever) to classify.
Also, you're much more likely to get prompt help on questions like this with the tag stanford-nlp.
Upvotes: 1