Paul Kremershof
Paul Kremershof

Reputation: 131

How can I modify the NLTK the stop word list in python?

I'm relatively new to the python/programming community so please excuse my relatively simple question: I would like to filter out stop words before lemmatizing a csv file. But I need the stop words "this"/"these" to be included in the final set.

After importing nltk stop words in Python and defining them as

stopwords = set(stopwords.words('english'))

... How can I modfiy this set keeping "this"/"these" in?

I know I could list every word manually except these two in question, but I was looking for a more elegant solution.

Upvotes: 1

Views: 3721

Answers (1)

cs95
cs95

Reputation: 402293

If you want those stopwords included in your final set, just remove them from the default stopwords list:

new_stopwords = set(stopwords.words('english')) - {'this', 'these'}

Or,

to_remove = ['this', 'these']
new_stopwords = set(stopwords.words('english')).difference(to_remove)

set.difference accepts any iterable.

Upvotes: 5

Related Questions