Is it possible to do lemmatization independently in spacy?

I'm using spacy to preprocess the data for sentiment analysis.

What I want to do is:

1) Lemmatization
2) POS tagging on lemmatized words

But since spacy does all the process at once when the parser is called it's doing all the calculations twice. Is there an option to disable non-required calculations?

Upvotes: 1

Views: 548

Answers (1)

syllogism_
syllogism_

Reputation: 4297

Have a look at the Language.call method to see how the various processes are being applied in sequence. There aren't many -- it's basically:

doc = nlp.tokenizer(text)
nlp.tagger(doc)
nlp.parser(doc)
nlp.entity(doc)

If you need a different sequence, you should just write your own function to string them together differently.

I'm not sure what you're asking makes sense, though. If you apply the POS tagger to lemmatized text, the statistical model probably won't perform very well. The inflectional suffixes are important features.

Upvotes: 3

Related Questions