Reputation: 11
I am using Spacy to get POS tags. To make the code faster I used nlp.pipe and try disabling the components that I do not need. if I disable 'parser', I get very different results for AUXs and VERBs. However, the results are similar for NOUNs and ADJs. It seems that we need both 'parser' and 'tagger' to get the correct number of VERBs and AUXs. Is my interpretation correct?
Further, the documents say that we need, 'parser' for lemmatization. But since POS tags depended on 'tagger'. But since POS tags depend on 'parser'. Do we need both 'parser' and 'tagger' for lemmatization or can I disable 'parser' for lemmatization?
Upvotes: 1
Views: 500
Reputation: 11494
The POS tags come from rules that map token.tag
to token.pos
in the attribute_ruler
component. If the dependency parse is available, there are more specific rules it can apply related to AUX
and VERB
. The mapping is hard to do perfectly because the token.tag
PTB tags that come from the tagger don't make an aux/verb distinction at all.
If you need POS tags with en_core_web_*
, you need at least tok2vec
+tagger
+attribute_ruler
. You can optionally add parser
, but it's not required.
A full description of the pipeline design is here: https://spacy.io/models#design
Upvotes: 1