Grace Wang
Grace Wang

Reputation: 1

Empty output when reproducing Chinese coreference results on Conll-2012 using CoreNLP Neural System

Following the instructions on this page https://stanfordnlp.github.io/CoreNLP/coref.html#running-on-conll-2012, Here's my code when I tried to reproduce Chinese coreference results on Conll-2012:

public class TestCoref {

public static void main(String[] args) throws Exception {

    Properties props = StringUtils.argsToProperties(args);

    props.setProperty("props", "edu/stanford/nlp/coref/properties/neural-chinese-conll.properties");

    props.setProperty("coref.data", "path-to/data/conll-2012");

    props.setProperty("coref.conllOutputPath", "path-to-output/conll-results");

    props.setProperty("coref.scorer", "path-to/reference-coreference-scorers/v8.01/scorer.pl");


    CorefSystem coref = new CorefSystem(props);


    coref.runOnConll(props);

}

}

As output, I got 3 files like these:

"date-time.coref.predicted.txt

date-time.coref.gold.txt

date-time.predicted.txt"

but all of them are EMPTY!

I got my "conll-2012" data as follows:

First I downloaded train/dev/test-key data from this page http://conll.cemantix.org/2012/data.html, as well as the ontonote-release-5.0 from LDC. Then I ran the script skeleton2conll.sh provided with the official conll 2012 data which produced _conll files.

the model I used is downloaded here http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar

When I tried to find the problem, I noticed that there exists a function "annotate" in the class CorefSystem which seems to do the real job, but it is not used at all. https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/coref/CorefSystem.java

I wonder if there is a bug in runOnConll function which doesn't read an annotate anything, or how could I reproduce the coreference results?

PS:

I especially want to produce some results on conversational data like "tc" and "bc" in conll-2012. I find that using the coreference API, I can only parse textual data. Is there any other way to use Neural Coref System on conversational data (where different speakers should be indicated) apart from running on Conll-2012?

thanks in advance for help!

Upvotes: 0

Views: 152

Answers (1)

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

As a start, why don't you run this command from the command line:

java -Xmx10g -cp stanford-corenlp-3.9.1.jar:stanford-chine-corenlp-models-3.9.1.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-chinese-conll.properties -coref.data <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>

Upvotes: 0

Related Questions