Extracting triplet from unstructured text

Question

I need to extract simple triplets from unstructured text. Usually it is of the form noun- verb- noun, so I have tried POS tagging and then extracting nouns and verbs from neighbourhood. However it leads to lot of cases and gives low accuracy. Will Syntactic/semantic parsing help in this scenario?

Will ontology based information extraction be more useful?

ejb · Accepted Answer

I expect that syntactic parsing would be the best fit for your scenario. Some trivial template-matching method with POS tags might work, where you find verbs preceded and followed by a single noun, and take the former to be the subject and the latter the object. However, it sounds like you've already tried something like that -- unless your neighborhood extraction ignores word order (which would be a bit silly - you'd be guessing which noun was the word and which was the object, and that's assuming exactly two nouns in each sentence).

Since you're looking for {s, v, o} triplets, chances are you won't need semantic or ontological information. That would be useful if you wanted more information, e.g. agent-patient relations or deeper knowledge extraction.

{s,v,o} is shallow syntactic information, and given that syntactic parsing is considerably more robust and accessible than semantic parsing, that might be your best bet. Syntactic parsing will be sensitive to simple word re-orderings, e.g. "The hamburger was eaten by John." => {John, eat, hamburger}; you'd also be able to specifically handle intransitive and ditransitive verbs, which might be issues for a more naive approach.

Extracting <subject, predicate, object> triplet from unstructured text

Answers (1)

Related Questions

Extracting &lt;subject, predicate, object&gt; triplet from unstructured text

Answers (1)

Related Questions

Extracting <subject, predicate, object> triplet from unstructured text