Reputation: 109
I'm trying to apply punctuation removal, stopwords removal and lemmatization to a list of strings
I tried to use lemma_, is_stop and is_punct
data = ['We will pray and hope for the best',
'Though it may not make landfall all week if it follows that track',
'Heavy rains, capable of producing life-threatening flash floods, are possible']
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
nlp = spacy.load("en")
doc = list(nlp.pipe(data))
data_clean = [[w.lemma_ for w in doc if not w.is_stop and not w.is_punct and not w.like_num] for doc in data]
I have the following error: AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'
(same problem for is_stop and is_punct)
Upvotes: 0
Views: 4543
Reputation: 314
You iterate over the unprocessed list of strings data
in the outer-loop, but you need to iterate over doc
.
Further, your variables have unfavorable names, the following naming should be less confusing:
docs = list(nlp.pipe(data))
data_clean = [[w.lemma_ for w in doc if (not w.is_stop and not w.is_punct and not w.like_num)] for doc in docs]
Upvotes: 3