Reputation: 201
I have a set of urls in a text file. For each url in that text file, I want to tag the entities and relationships in the text contained in that url.
I am aware of the entity taggers like Stanford NER, NLTK and GATE which can perform the entity tagging. However, I am more interested in relationship extraction.
In order to extract relationships, I am thinking of annotating the text contained in those urls for training purpose. For this, I do not want to do manual annotation. I can write few regex to extract the relationship which I want, however it would be difficult to scale up.
Is there a tool where in I can specify what I want to annotate?
For example:
" Rob is working as the Director of ABC organization. He graduated from XYZ University "
Here, I want to extract the affiliations relationship, so intuitively I would like to annotate words which describe the affiliations like working, graduated.
Edit: By "a set of URLs in the text file", I mean I have about 200 links to certain webpages in that text file, each of the webpage contains some text. I want to analyse (annotate) that text.
Upvotes: 0
Views: 746
Reputation: 437
There is no PR in GATE that that will pair arguments and create instances for you. You must therefore create instances that are relevant to your problem.
You can:
You can probably split your corpus on a training and a test dataset.
You can use the GATE training course about Relation Extration that contains all you need:
Upvotes: 1