Reputation: 1
Following the instructions on this page https://stanfordnlp.github.io/CoreNLP/coref.html#running-on-conll-2012, Here's my code when I tried to reproduce Chinese coreference results on Conll-2012:
public class TestCoref {
public static void main(String[] args) throws Exception {
Properties props = StringUtils.argsToProperties(args);
props.setProperty("props", "edu/stanford/nlp/coref/properties/neural-chinese-conll.properties");
props.setProperty("coref.data", "path-to/data/conll-2012");
props.setProperty("coref.conllOutputPath", "path-to-output/conll-results");
props.setProperty("coref.scorer", "path-to/reference-coreference-scorers/v8.01/scorer.pl");
CorefSystem coref = new CorefSystem(props);
coref.runOnConll(props);
}
}
As output, I got 3 files like these:
"date-time.coref.predicted.txt
date-time.coref.gold.txt
date-time.predicted.txt"
but all of them are EMPTY!
I got my "conll-2012" data as follows:
First I downloaded train/dev/test-key data from this page http://conll.cemantix.org/2012/data.html, as well as the ontonote-release-5.0 from LDC. Then I ran the script skeleton2conll.sh provided with the official conll 2012 data which produced _conll files.
the model I used is downloaded here http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar
When I tried to find the problem, I noticed that there exists a function "annotate" in the class CorefSystem which seems to do the real job, but it is not used at all. https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/coref/CorefSystem.java
I wonder if there is a bug in runOnConll function which doesn't read an annotate anything, or how could I reproduce the coreference results?
PS:
I especially want to produce some results on conversational data like "tc" and "bc" in conll-2012. I find that using the coreference API, I can only parse textual data. Is there any other way to use Neural Coref System on conversational data (where different speakers should be indicated) apart from running on Conll-2012?
thanks in advance for help!
Upvotes: 0
Views: 152
Reputation: 8739
As a start, why don't you run this command from the command line:
java -Xmx10g -cp stanford-corenlp-3.9.1.jar:stanford-chine-corenlp-models-3.9.1.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-chinese-conll.properties -coref.data <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>
Upvotes: 0