Reputation: 15
From a unstructured text, I have extracted all necessary entities and stored it in a dictionary using stanford POS tagger. Now I want to extract the relation between them to build my own Ontology in the form of triplets (Entity1,Entity2,relation). I tried the stanford dependencies parser, but I don't know how to extract these triplets.
For example: The front diffusers comprise pivotable flaps that are arranged between boundary walls of air ducts.
I want to have the relation (front diffusers, pivotable flaps, comprise); (pivotable flaps, boundary walls of air ducts, arrange);
Another example: The cargo body comprises a container having a floor, a top wall, a front wall, side walls and a rear door.
My expected relations are (cargo body, container, comprise); (container, floor, have); (container,top wall, have); (container, front wall, have); (container, side walls, have); (container, rear door, have).
What can I do with the stanford dependencies parser to achieve my goal? This means how to navigate the dependencies parse tree and get the results?
Upvotes: 1
Views: 800
Reputation: 2554
You are on correct path with using dependency parsers. You just need to dig in little deeper to extract the structure you are looking for. From what I can see, the dependency parser has all the information that you are looking for:
(ROOT
(S
(NP (DT The) (JJ front) (NNS diffusers))
(VP (VBP comprise)
(NP
(NP (JJ pivotable) (NNS flaps))
(SBAR
(WHNP (WDT that))
(S
(VP (VBP are)
(VP (VBN arranged)
(PP (IN between)
(NP
(NP (NN boundary) (NNS walls))
(PP (IN of)
(NP (NN air) (NNS ducts)))))))))))
(. .)))
Here is what you actually need right from the parser itself:
nsubj(comprise-4, diffusers-3)
root(ROOT-0, comprise-4)
amod(flaps-6, pivotable-5)
dobj(comprise-4, flaps-6)
Just study different sentences and you will be able to extract the info in whichever format you wish to get it.
Upvotes: 1