Reputation: 352
I've successfully trained a Relation Extractor model and created a .ser file.
However, I'm running into an issue where the model successfully finds a relation but if one of its entities consists of multiple tokens, only one token is selected. For example, for a relation called Friend_of, and a sentence like:
Sam Tarly's best friend is Jon Snow.
The model will find a relation of type Friend_of between the following entities:
This causes my tests to mark this as a false positive and the model as a whole to get a bad score.
I've tried training a custom NER model using the same training data, and then using this custom NER model to train the RelationExtractor model with the following properties in my props file:
trainUsePipelineNER=true
ner.model=path/to/custom-ner-model.ser.gz
But that didn't solve the problem.
Is this just a problem of not enough training data or is there something I'm missing here?
Here is the Java code I use to get the relations:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, depparse, relation");
props.put("sup.relation.model", "lib/custom-relation-model-pipeline.ser");
props.put("pos.ptb3Escaping", "false");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
List<Relation> foundRelations = new ArrayList<>();
for (String doc : documents) {
Annotation document = new Annotation(doc);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<RelationMention> relationMentions = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class);
for (RelationMention relation : relationMentions) {
foundRelations.add(new Relation(relation.getArg(0).getValue(), relation.getType(), relation.getArg(1).getValue()));
}
}
}
Thank you!
Simon.
Upvotes: 2
Views: 705
Reputation: 41
I know this should probably be a comment, but I'm not able to commet yet. I've also been trying to train a relation extraction model, but haven't been successful. Is there any chance you'd be willing to share a GitHub repo or more information about how you were able to add a new relation? I'm trying to do almost the exact same thing, but keep getting stuck. Thanks!
Upvotes: 0
Reputation: 8739
So I looked into the MachineReading relation extraction some more.
I think you want to replace getValue()
with getExtentString()
and see if that helps.
I ran on a sample sentence with our default model:
Joe Smith works at Google.
And it worked properly.
Upvotes: 1
Reputation: 8739
I think you might want to try training the KBPAnnotator with your custom relationship "friend_of". Then you can use kbp
instead of relation
in your pipeline, and kbp has better support for handling full mentions. When you are done training your model file, you can run the pipeline with -kbp.model
set to the path where you saved the statistical model.
1.) Study the main method of KBPStatisticalExtractor to see how training is done.
2.) I think you need to add your new "friend_of" relationship to the list of known relations in KBPRelationExtractor.java
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/ie/KBPRelationExtractor.java
3.) You need to put your training data into CoNLL format. Here is an example sentence that is in the CoNLL training format. Note how the subject and object of the relation are specified, and note that the first line for the example sentence is "per:employee_of". Separate all of the sentences in your training data with a blank line. Note each column is separated by tab.
per:employee_of
DEE SUBJECT PERSON - - NNP PERSON compound 2
DEE SUBJECT PERSON - - NNP PERSON compound 2
MYERS SUBJECT PERSON - - NNP PERSON ROOT -1
, - - - - , O punct 2
White - - OBJECT ORGANIZATION NNP LOCATION compound 13
House - - OBJECT ORGANIZATION NNP LOCATION compound 13
Press - - - - NNP O dep 13
Secretary - - - - NNP O dep 13
The - - - - NNP O det 9
first - - - - JJ ORDINAL nsubj 13
is - - - - VBZ O cop 13
the - - - - DT O det 13
US - - - - NNP LOCATION compound 13
interests - - - - NNS O appos 2
in - - - - IN O case 15
Haiti - - - - NNP LOCATION nmod 13
and - - - - CC O cc 13
in - - - - IN O case 19
the - - - - DT O det 19
region - - - - NN O conj 13
. - - - - . O punct 2
Let me know if you need any more advice or help on this project !
Upvotes: 1