wordninja does not work with other languages

Question

I have a question that I cant solve alone. I am currently building an NLP preprocessing pipeline and though about using wordninja with cyrilic languages (Russian and Ukrainian) I have set the dictionaries as described and everything seemed to look alright, but I can make it work.

import wordninja
wordninja.DEFAULT_LANGUAGE_MODEL = wordninja.LanguageModel('setup/ru_ninja_dict.txt.gz')
wordninja.split("приветпока")

(the output is an empty list [], while ["привет", "пока"] was expected)

My main assumption is that there is an issue with encodings. However, I do not know how to check it myself.

Please let me know if you have any ideas!

wordninja does not work with other languages

Answers (1)

Related Questions