Reputation: 21
SO I want to apply stemming using Snowballstemmer on a column (unstemmed) of a dataframe in order to use a classification algorithm.
So my code looks like the following:
df = pd.read_excel(...)
df["content"] = df['column2'].str.lower()
stopword_list = nltk.corpus.stopwords.words('dutch')
df['unstemmed'] = df['content'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stopword_list)]))
df["unstemmed"] = df["unstemmed"].str.replace(r"[^a-zA-Z ]+", " ").str.strip()
df["unstemmed"] = df["unstemmed"].replace('\s+', ' ', regex=True)
df['unstemmed'] = df['unstemmed'].str.split()
df['stemmed'] = df['unstemmed'].apply(lambda x : [stemmer.stem(y) for y in x])
So first, I convert all upper cases to lower cases and remove all Dutch stopwords. This is followed by removing all special characters and then splitting all words. I checked and all columns are "objects".
I get the following error: stem() missing 1 required positional argument: 'token'
How can I solve this?
Upvotes: 0
Views: 272