Reputation: 73
I only want to see words where is_alpha is true and is_stop is false and at the end i would like to store the lemma version of the word. Thank you :)
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,token.shape_, token.is_alpha == True, token.is_stop == False )
Upvotes: 1
Views: 166
Reputation: 626758
You can use a list comprehension like
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
print( [t.lemma_ for t in doc if t.is_alpha and not t.is_stop ] )
Output:
['Apple', 'look', 'buy', 'startup', 'billion']
Here, if the token only consists of letters and is not a stopword (if t.is_alpha and not t.is_stop
) the token lemma is returned (t.lemma_
).
Upvotes: 2