Reputation: 952
I am trying to create a knowledgebase based on text mining. I am using Genia Corpus to tag the words by their Parts of speech. Given two terms from the text, how do i create a model that finds out its relation?
Eg Text:
HIF1A gene is involved in Hypoxic regulation. Hypoxia also up regulates BRCA1 gene expression which is mainly associated in breast cancer.
I have the POS tagged out.
Word Base Form Part-Of-Speech
HIF1A HIF1A NN
gene gene NN
is be VBZ
involved involve VBN
in in IN
Hypoxic Hypoxic JJ
regulation regulation NN
. . .
Hypoxia Hypoxia NN
also also RB
regulates regulate VBZ
BRCA1 BRCA1 NN
gene gene NN
which which WDT
is be VBZ
mainly mainly RB
associated associate VBN
in in IN
breast breast NN
cancer cancer NN
I am writing a web interface that when queried BRCA1 and Hypoxia should tell that there is positive regulation between them. when queried HIF1A and Hypoxia it should tell that there is a positive regulation based on these sentences.
Now that i have the POS tagged I dont know how to proceed in creating a model that would come up with identifying the relation between them. This is just an example. I want to do it for general biomedical terms and texts.
Anyone any suggestions?
Upvotes: 1
Views: 100
Reputation: 1246
Relying solely on the output of a POS tagger you'll have to define local grammar rules (patterns).
Personally, I would suggest you to use a (syntactic) parser to get argument structures like regulate(Hypoxia,BRCA1)
...
Upvotes: 2