Reputation: 1825
spaCy POS tagger is usally used on entire sentences. Is there a way to efficiently apply a unigram POS tagging to a single word (or a list of single words)?
Something like this:
words = ["apple", "eat", good"]
tags = get_tags(words)
print(tags)
> ["NNP", "VB", "JJ"]
Thanks.
Upvotes: 1
Views: 2398
Reputation: 3237
You can do something like this:
import spacy
nlp = spacy.load("en_core_web_sm")
word_list = ["apple", "eat", "good"]
for word in word_list:
doc = nlp(word)
print(doc[0].text, doc[0].pos_)
alternatively, you can do
import spacy
nlp = spacy.load("en_core_web_sm")
doc = spacy.tokens.doc.Doc(nlp.vocab, words=word_list)
for name, proc in nlp.pipeline:
doc = proc(doc)
pos_tags = [x.pos_ for x in doc]
Upvotes: 2
Reputation: 11474
English unigrams are often hard to tag well, so think about why you want to do this and what you expect the output to be. (Why is the POS of apple
in your example NNP
? What's the POS of can
?)
spacy isn't really intended for this kind of task, but if you want to use spacy, one efficient way to do it is:
import spacy
nlp = spacy.load('en')
# disable everything except the tagger
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "tagger"]
nlp.disable_pipes(*other_pipes)
# use nlp.pipe() instead of nlp() to process multiple texts more efficiently
for doc in nlp.pipe(words):
if len(doc) > 0:
print(doc[0].text, doc[0].tag_)
See the documentation for nlp.pipe()
: https://spacy.io/api/language#pipe
Upvotes: 5