Phoeniyx
Phoeniyx

Reputation: 572

Interactive NLP part-of-speech (POS) tagging - forcing certain terms to be a particular tag

I am trying to perform POS tagging, and I am open to any Java based tagger (currently using OpenNLP). Is there a way to "force" the tagger to recognize a particular term (or combination of words) as a particular tag and classify the others based on this? To sort of allow "interactive correction" of the tagging. Given such "correction" is interactive, "full training" of the tagger real-time with this new information is not really practical.

So, for example consider the sentence: "I never gave swimming in the lake all that much effort". Here, "swimming" is a gerrund (a noun, as opposed to a verb), and the user may say "swimming in the lake" is a noun (in the context of this whole sentence). If he specifies this, it's not good if the tagger spits out "lake" as a separate noun, since "lake" is already part of "swimming in the lake".

What do you guys think is the best way to go about it? Is there an API call, or am I just going to have to substitute out "swimming in the lake" with something else before tagging? But, I don't think the latter approach is as reliable, since then I am still relying on the tagger to tag it properly, when the use has told me exactly what it should be. Thanks.

Upvotes: 0

Views: 149

Answers (1)

Gabor Angeli
Gabor Angeli

Reputation: 5759

If you want to have other labels change around a given fixed POS tag, there is (as far as I know) no way to do this in CoreNLP without retraining the tagger.

But, it sounds like what you want here is actually a tokenizer difference: "swimming in the lake" is a noun phrase rather than a noun, and no matter how you train the POS tagger it will tag the four words in the phrase independently. One thing you can do is use a chunker (I think OpenNLP has one), or a parser to extract these noun phrases; in fact, a parse should correctly guess that the span is a noun phrase, even if the POS tagger messed up.

Upvotes: 1

Related Questions