Simon
Simon

Reputation: 352

Stanford Relation Extractor custom model selects only one token of relation entities

I've successfully trained a Relation Extractor model and created a .ser file.

However, I'm running into an issue where the model successfully finds a relation but if one of its entities consists of multiple tokens, only one token is selected. For example, for a relation called Friend_of, and a sentence like:

Sam Tarly's best friend is Jon Snow.

The model will find a relation of type Friend_of between the following entities:

This causes my tests to mark this as a false positive and the model as a whole to get a bad score.

I've tried training a custom NER model using the same training data, and then using this custom NER model to train the RelationExtractor model with the following properties in my props file:

trainUsePipelineNER=true
ner.model=path/to/custom-ner-model.ser.gz

But that didn't solve the problem.

Is this just a problem of not enough training data or is there something I'm missing here?

Here is the Java code I use to get the relations:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, depparse, relation");
props.put("sup.relation.model", "lib/custom-relation-model-pipeline.ser");
props.put("pos.ptb3Escaping", "false");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

List<Relation> foundRelations = new ArrayList<>();

for (String doc : documents) {
    Annotation document = new Annotation(doc);
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {

        List<RelationMention> relationMentions = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class);

        for (RelationMention relation : relationMentions) {
            foundRelations.add(new Relation(relation.getArg(0).getValue(), relation.getType(), relation.getArg(1).getValue()));
        }

    }
}

Thank you!

Simon.

Upvotes: 2

Views: 705

Answers (3)

jw_nu
jw_nu

Reputation: 41

I know this should probably be a comment, but I'm not able to commet yet. I've also been trying to train a relation extraction model, but haven't been successful. Is there any chance you'd be willing to share a GitHub repo or more information about how you were able to add a new relation? I'm trying to do almost the exact same thing, but keep getting stuck. Thanks!

Upvotes: 0

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

So I looked into the MachineReading relation extraction some more.

I think you want to replace getValue() with getExtentString() and see if that helps.

I ran on a sample sentence with our default model:

Joe Smith works at Google.

And it worked properly.

Upvotes: 1

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

I think you might want to try training the KBPAnnotator with your custom relationship "friend_of". Then you can use kbp instead of relation in your pipeline, and kbp has better support for handling full mentions. When you are done training your model file, you can run the pipeline with -kbp.model set to the path where you saved the statistical model.

1.) Study the main method of KBPStatisticalExtractor to see how training is done.

https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/ie/KBPStatisticalExtractor.java

2.) I think you need to add your new "friend_of" relationship to the list of known relations in KBPRelationExtractor.java

https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/ie/KBPRelationExtractor.java

3.) You need to put your training data into CoNLL format. Here is an example sentence that is in the CoNLL training format. Note how the subject and object of the relation are specified, and note that the first line for the example sentence is "per:employee_of". Separate all of the sentences in your training data with a blank line. Note each column is separated by tab.

per:employee_of DEE SUBJECT PERSON - - NNP PERSON compound 2 DEE SUBJECT PERSON - - NNP PERSON compound 2 MYERS SUBJECT PERSON - - NNP PERSON ROOT -1 , - - - - , O punct 2 White - - OBJECT ORGANIZATION NNP LOCATION compound 13 House - - OBJECT ORGANIZATION NNP LOCATION compound 13 Press - - - - NNP O dep 13 Secretary - - - - NNP O dep 13 The - - - - NNP O det 9 first - - - - JJ ORDINAL nsubj 13 is - - - - VBZ O cop 13 the - - - - DT O det 13 US - - - - NNP LOCATION compound 13 interests - - - - NNS O appos 2 in - - - - IN O case 15 Haiti - - - - NNP LOCATION nmod 13 and - - - - CC O cc 13 in - - - - IN O case 19 the - - - - DT O det 19 region - - - - NN O conj 13 . - - - - . O punct 2

Let me know if you need any more advice or help on this project !

Upvotes: 1

Related Questions