Reputation: 61
I would like to know how to configure stopwordsremover with french language in spark 1.6.3.
I'm currently using pyspark.
Thanks for your help.
Best regards,
Upvotes: 4
Views: 2449
Reputation: 724
Take a look at the nltk package
I use it for portuguese words:
from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")
...
stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)
Hope it helps
Upvotes: 3
Reputation: 326
Based on Python Spark 1.6.3 docs, pyspark.ml.feature.StopWordsRemover does not have a language parameter. However you can always provide your own list of stopwords via the "stopWords" parameter.
Upvotes: 0