Reputation: 786
I am preprocessing a text and want to remove common stopwords in german. This works almost fine with the following code [final_wordlist as example data]:
from nltk.corpus import stopwords
final_wordlist =['Status', 'laufende', 'Projekte', 'bei', 'Stand', 'Ende', 'diese', 'Bei']
stopwords_ger = stopwords.words('german')
filtered_words = [w for w in final_wordlist if w not in stopwords_ger]
print(filtered_words)
This yields:
['Status', 'laufende', 'Projekte', 'Stand', 'Ende', 'Bei']
But as you can see, the upper case 'Bei' is not removed (as it should) as the stopwords from nltk are all lower case. Is there a easy way to remove all stopwords caseinsensitively?
Upvotes: 0
Views: 3355
Reputation: 460
Try this : filtered_words = [w for w in final_wordlist if w.lower() not in stopwords_ger]
Upvotes: 6