Mathy
Mathy

Reputation: 21

Applying Snowballstemmer to a Pandas dataframe for each word

SO I want to apply stemming using Snowballstemmer on a column (unstemmed) of a dataframe in order to use a classification algorithm.

So my code looks like the following:

df = pd.read_excel(...)
df["content"] = df['column2'].str.lower()
stopword_list = nltk.corpus.stopwords.words('dutch')
df['unstemmed'] = df['content'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stopword_list)]))
df["unstemmed"] = df["unstemmed"].str.replace(r"[^a-zA-Z ]+", " ").str.strip()
df["unstemmed"] = df["unstemmed"].replace('\s+', ' ', regex=True)

df['unstemmed'] = df['unstemmed'].str.split()
df['stemmed'] = df['unstemmed'].apply(lambda x : [stemmer.stem(y) for y in x])

So first, I convert all upper cases to lower cases and remove all Dutch stopwords. This is followed by removing all special characters and then splitting all words. I checked and all columns are "objects".

I get the following error: stem() missing 1 required positional argument: 'token'

How can I solve this?

Upvotes: 0

Views: 272

Answers (0)

Related Questions