Is there any way to give an input file to Stanza (stanford corenlp client) rather then one piece of text while calling server?

Question

I have a .csv file consists of Imdb sentiment analysis data-set. Each instance is a paragraph. I am using Stanza https://stanfordnlp.github.io/stanza/client_usage.html for getting parse tree for each instance.

text = "Chris Manning is a nice person. Chris wrote a simple sentence. He also gives oranges to people."

with CoreNLPClient(
    annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
    timeout=30000,
    memory='16G') as client:
ann = client.annotate(text)

Right now, I have to re-run server for every instance and it is taking a lot of time since I have 50k instances.

1
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp- 
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads 
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-a74576b3341f4cac.props 
-preload parse
2
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp- 
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads 
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-d09e0e04e2534ae6.props 
-preload parse

Is there any way to pass a file or do batching?

StanfordNLPHelp · Accepted Answer

You should only start the server once. It'd be easiest to load the file in Python, extract each paragraph, and submit the paragraphs. You should pass each paragraph from your IMDB to the annotate() method. The server will handle sentence splitting.

Is there any way to give an input file to Stanza (stanford corenlp client) rather then one piece of text while calling server?

Answers (1)

Related Questions