Huge overhead to loading StanfordOpenNLP model in Java?

Question

I'm trying to use StanfordNLP to do coreference resolution on chunks of text relating to a given topic, and while trying to load in the StanfordCoreNLP model, it at first completely ran out of memory while loading models, but now is still taking upwards of 15 minutes to load.

I have code like:

 public Map getCoreferences(String text) {
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation document = new Annotation(text);

    pipeline.annotate(document);

    return document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
}

Is this unavoidable by design? Will it even be possible to do coreference resolution like this in a production application where anything more than 10 seconds is unacceptable?

Proghero · Accepted Answer

Yes, it's much faster if you don't instantiate StanfordCoreNLP inside your method. Store it as a class variable.

More specifically, move the following to outside your method:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Hope it helps! ;)

Huge overhead to loading StanfordOpenNLP model in Java?

Answers (1)

Related Questions