Reputation: 137
I have a .csv file consists of Imdb sentiment analysis data-set. Each instance is a paragraph. I am using Stanza https://stanfordnlp.github.io/stanza/client_usage.html for getting parse tree for each instance.
text = "Chris Manning is a nice person. Chris wrote a simple sentence. He also gives oranges to people."
with CoreNLPClient(
annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
timeout=30000,
memory='16G') as client:
ann = client.annotate(text)
Right now, I have to re-run server for every instance and it is taking a lot of time since I have 50k instances.
1
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp-
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-a74576b3341f4cac.props
-preload parse
2
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp-
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-d09e0e04e2534ae6.props
-preload parse
Is there any way to pass a file or do batching?
Upvotes: 1
Views: 908
Reputation: 8739
You should only start the server once. It'd be easiest to load the file in Python, extract each paragraph, and submit the paragraphs. You should pass each paragraph from your IMDB to the annotate()
method. The server will handle sentence splitting.
Upvotes: 1