nassimlaga
nassimlaga

Reputation: 61

pyspark : how to configure StopWordsRemover with french language on spark 1.6.3

I would like to know how to configure stopwordsremover with french language in spark 1.6.3.

I'm currently using pyspark.

Thanks for your help.

Best regards,

Upvotes: 4

Views: 2449

Answers (2)

André Machado
André Machado

Reputation: 724

Take a look at the nltk package

I use it for portuguese words:

from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")

...

stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)

Hope it helps

Upvotes: 3

Abdel Jaidi
Abdel Jaidi

Reputation: 326

Based on Python Spark 1.6.3 docs, pyspark.ml.feature.StopWordsRemover does not have a language parameter. However you can always provide your own list of stopwords via the "stopWords" parameter.

Upvotes: 0

Related Questions