ladybug
ladybug

Reputation: 602

TypeError: 'spacy.tokens.token.Token' object is not iterable

I'm trying to apply text preprocessing to a pandas column, with spacy. My goal is to apply preprocessing and then use this clean column for further analysis with other columns.

Data:

    category    content
0   business    Quarterly profits at US media giant TimeWarne...
1   business    The dollar has hit its highest level against ...
2   business    The owners of embattled Russian oil giant Yuk...
3   business    British Airways has blamed high fuel prices f...
4   business    Shares in UK drinks and food firm Allied Dome...

My preprocessing:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(str(df['content']))

new_corpus = [[words.lemma_ for words in docs if (not words.is_stop and not words.is_punct and not words.like_num)] for docs in doc]
corpus_clean = [[word.lower() for word in docu if (word.isalpha())] for docu in new_corpus]

Error:

TypeError: 'spacy.tokens.token.Token' object is not iterable

Upvotes: 2

Views: 589

Answers (1)

Hannibal
Hannibal

Reputation: 316

You have a problem with the dataframe conversion.
You wanted to get a list of 'content' but instead you turned the content column into a string.
You should change this line :
doc = nlp(str(df['content']))
To this:
doc = nlp.pipe(df['content'].tolist())

Upvotes: 1

Related Questions