Reputation: 1468
I have created a custom Named Entity Recognition(NER) classifier and a custom Relationship Extraction(RE) classifier. In the training data for the RE, I have given it a set of 10 sentences in which I have given the exact entities and the relationship between them.
When I am running the code I am getting the correct relationships for 6 out of the 10 sentences. However, I am not getting the correct relationship in all the sentences. I wanted to understand why is the RE code not able to identify the correct relationships in the sentences even though I have given the exact same sentence in the training data?
For example, the following sentence:
The Fund's objective is to help our members achieve the best possible RetOue.
In the training data, the relationship given is
Fund RetOue build
Below are all the RelationMentions found in the sentence and it can be seen that the relation beween "Fund" and "RetOut" is coming as _NR and has a probability of (_NR, 0.6074190677382846) and the actual relation (build, 0.26265263651796966) has a lower probability. The second one in the list below:
RelationMention [type=_NR, start=1, end=9, {_NR, 0.8706606065870188; build, 0.04609463244214589; reply, 0.014127678851794745; cause, 0.01412618987143006; deliver, 0.014028667880335159; calculate, 0.014026673364224201; change, 0.013888249765034161; collaborate, 0.01304730123801706}
EntityMention [type=RESOURCE, objectId=EntityMention-10, hstart=1, hend=2, estart=1, eend=2, headPosition=1, value="Fund", corefID=-1]
EntityMention [type=ROLE, objectId=EntityMention-11, hstart=8, hend=9, estart=8, eend=9, headPosition=8, value="members", corefID=-1]
]
RelationMention [type=_NR, start=1, end=14, {_NR, 0.6074190677382846; build, 0.26265263651796966; collaborate, 0.029635339573025835; reply, 0.020273680468829585; cause, 0.020270355199687763; change, 0.020143296854960534; calculate, 0.019807048865472295; deliver, 0.01979857478176975}
EntityMention [type=RESOURCE, objectId=EntityMention-10, hstart=1, hend=2, estart=1, eend=2, headPosition=1, value="Fund", corefID=-1]
EntityMention [type=RESOURCE, objectId=EntityMention-12, hstart=13, hend=14, estart=13, eend=14, headPosition=13, value="RetOue", corefID=-1]
]
RelationMention [type=_NR, start=1, end=9, {_NR, 0.9088620248226259; build, 0.029826907381364745; cause, 0.01048834533846858; reply, 0.010472406713467062; change, 0.010430417119225247; deliver, 0.010107963031033371; calculate, 0.010090071219976819; collaborate, 0.009721864373838134}
EntityMention [type=ROLE, objectId=EntityMention-11, hstart=8, hend=9, estart=8, eend=9, headPosition=8, value="members", corefID=-1]
EntityMention [type=RESOURCE, objectId=EntityMention-10, hstart=1, hend=2, estart=1, eend=2, headPosition=1, value="Fund", corefID=-1]
]
RelationMention [type=_NR, start=8, end=14, {_NR, 0.6412212367693484; build, 0.0795874107991397; deliver, 0.061375929752833555; calculate, 0.061195561682179045; cause, 0.03964100603702037; reply, 0.039577811103586304; change, 0.03870906323316812; collaborate, 0.038691980622724644}
EntityMention [type=ROLE, objectId=EntityMention-11, hstart=8, hend=9, estart=8, eend=9, headPosition=8, value="members", corefID=-1]
EntityMention [type=RESOURCE, objectId=EntityMention-12, hstart=13, hend=14, estart=13, eend=14, headPosition=13, value="RetOue", corefID=-1]
]
RelationMention [type=_NR, start=1, end=14, {_NR, 0.8650327055005457; build, 0.05264799740623545; collaborate, 0.01878896136615606; reply, 0.012762167223115933; cause, 0.01276049397449083; calculate, 0.012671777715382195; change, 0.012668721250994311; deliver, 0.012667175563079464}
EntityMention [type=RESOURCE, objectId=EntityMention-12, hstart=13, hend=14, estart=13, eend=14, headPosition=13, value="RetOue", corefID=-1]
EntityMention [type=RESOURCE, objectId=EntityMention-10, hstart=1, hend=2, estart=1, eend=2, headPosition=1, value="Fund", corefID=-1]
]
RelationMention [type=_NR, start=8, end=14, {_NR, 0.8687007489440899; cause, 0.019732766828364688; reply, 0.0197319383076219; change, 0.019585387681083893; collaborate, 0.019321463597270272; deliver, 0.018836262558606865; calculate, 0.018763499991179922; build, 0.015327932091782685}
EntityMention [type=RESOURCE, objectId=EntityMention-12, hstart=13, hend=14, estart=13, eend=14, headPosition=13, value="RetOue", corefID=-1]
EntityMention [type=ROLE, objectId=EntityMention-11, hstart=8, hend=9, estart=8, eend=9, headPosition=8, value="members", corefID=-1]
]
I wanted to understand the reasons I should look out for for this.
Q.1 My assumption was that as entity types are being recognized accurately will help in the relationship getting recognized accurately. Is it correct?
Q.2 How can I improve my training data to make sure I ge the accurate relationship as the result?
Q.3 Does it matter how many records of each entity type I have defined? Should I maintain equal number of definitions for each relation type? For Example: In my training data if I have 10 exampls of the relationship "build", then should I define 10 relations each of the other relationship types as well like for "cause", "reply" etc.?
Q.4 My assumption is that the correct NER classification of the entity makes a difference in the relationship extraction. Is it correct?
Upvotes: 0
Views: 240
Reputation: 1468
There are lots of features that can be used by RE for improving the accuracy of the relationship classification that need to be analysed in detail.
Answers to my questions: A.1. Yes, entity types are being recognized accurately will help in the relationship getting recognized accurately. A.2. As far as I know, training data needs to be annotated and improved manually. A.3. As far as I know, yes the number of records defined between entities matters. A.4. The NER accuracy makes a difference in the RE accuracy.
Upvotes: 0
Reputation: 5749
Your assumptions that good NER information will help is correct, but chances are you'll need much more than 10 training examples. You should be thinking more along the lines of thousands of examples, optimally tens / hundreds of thousands of examples.
But, you should probably be memorizing the training set nonetheless. What are your training examples? Are you using the default features?
Upvotes: 1