SecML
SecML

Reputation: 29

Training multi word verb and noun entities with Spacy NER

All NER training instances I have come across are nouns, but is it possible to train entities with Spacy NER that are verb and noun combinations. For example 'stirring pot'.

Do i use a noun based NER first and then train a nested NER on such phrases or do i directly go for training the phrase in Spacy NER. I guess the answer will depend on if Spacy NER uses POS and dependency features as part of its training.

Upvotes: 0

Views: 684

Answers (1)

syllogism_
syllogism_

Reputation: 4297

NER technologies usually work best when the entities are fairly short, and when there are clear clues at the starts and ends of the phrases. These are both the case for recognising proper nouns in English, which is the canonical use-case the algorithms were developed for.

A noun phrase like "stepping stone" or "deciding factor" will be easy for an NER system to learn. The system would be less good at recognising verb + object constructions, as the verb and object might be arbitrarily far apart, e.g. stirring the pot, stirring the metal pot, stir the pot vigorously, etc. You should also be a bit wary of applying sequential labellings to arbitrary spans of text, that aren't syntactic constituents. It'll be very difficult to describe where the boundary of the phrases should fall, so your annotators probably won't behave consistently. Indecision about the exact boundaries of the phrases will make the NER system perform very poorly, because spans which differ by one word are seen as entirely different spans by the loss function.

Finally, to answer your question about the POS and dependency parsing features: no, we don't use these in the NER at the moment.

You might be interested in the dependency tree matcher contributed in these two pull requests:

More improvements to the Matcher will also help you: https://github.com/explosion/spaCy/issues/1971

Upvotes: 1

Related Questions