Hunspell (Python 3) how to handle umlauts

Question

I use CyHunspell and Python 3.6 (IDLE) on OS X to check if words are spelled correctly. It works for most words but not if it has german Umlauts like ä. So I guess encoding might be a problem. I already tried a few dictionaries as the LibreOffice one from here is ISO8859-1. I tried this one for Sublime which is UTF-8 but it doesn't work neither. I also convert the LibreOffice file to ISO8859-1 but still the same behavior.

My code:

import os
from hunspell import Hunspell
hunspell_path = os.path.dirname(os.path.abspath(__file__)) + "/dictionaries"
h = Hunspell("de_DE_utf8", hunspell_data_dir=hunspell_path)
print(h.spell("Beispiel")) # TRUE - should be TRUE
print(h.spell("überall")) # FALSE - should be TRUE
print(h.spell("über")) # TRUE - should be TRUE

What I don't understand is that "über" is TRUE.

All three words are in the "de_DE_utf8.dic":

beispiel/EPSozm
beispiel/hke
Beispiel/EPSmij
überall
Über/hij
über/Ske

Any idea what I could try to solve this problem? I found some informations about UTF-8 and Python in other questions but they often were about reading files.

glamredhel · Accepted Answer

I have tried, but with a different dictionary: https://extensions.libreoffice.org/extensions/german-de-de-frami-dictionaries where it seems to be working fine. Tried this for Python 3.5 on Ubuntu 16.04 though.

import os
from hunspell import Hunspell

dict_path = .....
h = Hunspell("de_DE_frami", hunspell_data_dir=dict_path)
print(h.spell("Beispiel"))
print(h.spell("über"))
print(h.spell("überall"))
print(h.spell("Über"))
True
True
True
True

Hunspell (Python 3) how to handle umlauts

Answers (1)

Related Questions