Decent multi-lingual stemmer or analyzer for Lucene / ElasticSearch?

Question

I'm curious if there are generic analyzers which do a decent job of stemming / analyzing text which could be in different languages. For certain tasks, doing proper multi-lingual search (e.g. splitting a field name into name.english, name.french, etc.) seems like overkill.

Is there an analyzer which will strip suffixes (e.g. "dogs" --> "dog") and work for more than just English? I don't really care whether it does language detection, etc., and working on just e.g. romantic & germanic languages would probably be good enough. Or, is the loss of quality serious enough that it's always worth just using language-specific analyzers and language-specific queries?

Decent multi-lingual stemmer or analyzer for Lucene / ElasticSearch?

Answers (1)

Related Questions