Reputation: 3249
I am trying to understand the Coreference NLP Stanford tools. This is my code and it is working:
import os
os.environ["CORENLP_HOME"] = "/home/daniel/StanfordCoreNLP/stanford-corenlp-4.0.0"
from stanza.server import CoreNLPClient
text = 'When he came from Brazil, Daniel was fortified with letters from Conan but otherwise did not know a soul except Herbert. Yet this giant man from the Northeast, who had never worn an overcoat or experienced a change of seasons, did not seem surprised by his past.'
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
properties={'annotators': 'coref', 'coref.algorithm' : 'neural'},timeout=30000, memory='16G') as client:
ann = client.annotate(text)
chains = ann.corefChain
chain_dict=dict()
for index_chain,chain in enumerate(chains):
chain_dict[index_chain]={}
chain_dict[index_chain]['ref']=''
chain_dict[index_chain]['mentions']=[{'mentionID':mention.mentionID,
'mentionType':mention.mentionType,
'number':mention.number,
'gender':mention.gender,
'animacy':mention.animacy,
'beginIndex':mention.beginIndex,
'endIndex':mention.endIndex,
'headIndex':mention.headIndex,
'sentenceIndex':mention.sentenceIndex,
'position':mention.position,
'ref':'',
} for mention in chain.mention ]
for k,v in chain_dict.items():
print('key',k)
mentions=v['mentions']
for mention in mentions:
words_list = ann.sentence[mention['sentenceIndex']].token[mention['beginIndex']:mention['endIndex']]
mention['ref']=' '.join(t.word for t in words_list)
print(mention['ref'])
I tried three algorithms:
he this giant man from the Northeast , who had never worn an overcoat or experienced a change of seasons Daniel his
this giant man from the Northeast , who had never worn an overcoat or experienced a change of seasons , his
deterministic (I got the error below)
> Starting server with command: java -Xmx16G -cp
> /home/daniel/StanfordCoreNLP/stanford-corenlp-4.0.0/*
> edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout
> 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties
> corenlp_server-9fedd1e9dfb14c9e.props -preload
> tokenize,ssplit,pos,lemma,ner,parse,depparse,coref Traceback (most
> recent call last):
>
> File "<ipython-input-58-0f665f07fd4d>", line 1, in <module>
> runfile('/home/daniel/Documentos/Working Papers/Leader traits/Code/20200704 - Modeling
> Organizing/understanding_coreference.py',
> wdir='/home/daniel/Documentos/Working Papers/Leader
> traits/Code/20200704 - Modeling Organizing')
>
> File
> "/home/daniel/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py",
> line 827, in runfile
> execfile(filename, namespace)
>
> File
> "/home/daniel/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py",
> line 110, in execfile
> exec(compile(f.read(), filename, 'exec'), namespace)
>
> File "/home/daniel/Documentos/Working Papers/Leader
> traits/Code/20200704 - Modeling
> Organizing/understanding_coreference.py", line 21, in <module>
> ann = client.annotate(text)
>
> File
> "/home/daniel/anaconda3/lib/python3.7/site-packages/stanza/server/client.py",
> line 470, in annotate
> r = self._request(text.encode('utf-8'), request_properties, **kwargs)
>
> File
> "/home/daniel/anaconda3/lib/python3.7/site-packages/stanza/server/client.py",
> line 404, in _request
> raise AnnotationException(r.text)
>
> AnnotationException: java.lang.RuntimeException:
> java.lang.IllegalArgumentException: No enum constant
> edu.stanford.nlp.coref.CorefProperties.CorefAlgorithmType.DETERMINISTIC
Questions:
Why am I getting this error with the deterministic?
Any piece of code using the NLP Stanford in Python seems to be much slower than the codes related with Spacy or NLTK. I know that there is no coreference in these other libraries. But for instance when I use import nltk.parse.stanford import StanfordDependencyParser
for dependence parse it is much faster then this StanfordNLP library. Is there any way to acelerate this CoreNLPClient in Python?
I will use this library to work with long texts. Is it better to work with smaller pieces with the entire text? Long texts can cause wrong results for coreference resolution (I have found very strange results for this coreference library when I am using long texts)? Is there an optimal size?
Results:
The results from the statistical algorithm seems to be better. I expected that the best result would come from the neural algorithm. Do you agree with me? There are 4 valid mention in the statistical algorithm while only 2 when I am using the neural algorithm.
Am I missing something?
Upvotes: 2
Views: 1567
Reputation: 126
You may find the list of supported algorithms in Java documentation: link
You might want to start the server and then just use it, something like
# Here's the slowest part—models are being loaded
client = CoreNLPClient(...)
ann = client.annotate(text)
...
client.stop()
But I cannot give you any clue regarding 3 and 4.
Upvotes: 1