Reputation: 131
I'm relatively new to the python/programming community so please excuse my relatively simple question: I would like to filter out stop words before lemmatizing a csv file. But I need the stop words "this"/"these" to be included in the final set.
After importing nltk stop words in Python and defining them as
stopwords = set(stopwords.words('english'))
... How can I modfiy this set keeping "this"/"these" in?
I know I could list every word manually except these two in question, but I was looking for a more elegant solution.
Upvotes: 1
Views: 3721
Reputation: 402293
If you want those stopwords included in your final set, just remove them from the default stopwords list:
new_stopwords = set(stopwords.words('english')) - {'this', 'these'}
Or,
to_remove = ['this', 'these']
new_stopwords = set(stopwords.words('english')).difference(to_remove)
set.difference
accepts any iterable.
Upvotes: 5