elfduck
elfduck

Reputation: 11

Pyenchant messes up foreign characters

Pyenchant messes up foreign characters and the spellcheck fails. My girlfriend is german so the word "häßlich" is a real german word and I also checked the word using different spellchecking services too.

The script file encoding is ANSI as UTF-8. I have tried to encode and decode the word into different kinds of character encodings, too.


#!/usr/bin/python
# -*- coding: utf-8 -*-

# Python bindings for the enchant spellcheck
import enchant

# Enchant dictionary
enchantdict = enchant.Dict("de_DE")

# Define german word for "ugly"
word = "häßlich"

# Print the original word and the spellchecked version of it
print word, "=", enchantdict.check(word)

And the output is as follows: häßlich = False


Also, if I change the script encoding into plain ANSI, this is what I get:

hõ¯lich =
** (python.exe:1096): CRITICAL **: enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
Traceback (most recent call last):
  File "C:\Temp\koe.py", line 14, in <module>
    print word, "=", enchantdict.check(word)
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 577, in check
    self._raise_error()
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 551, in _raise_
error
    raise eclass(default)
enchant.errors.Error: Unspecified Error

I am using: pyenchant-1.6.5.win32.exe python-2.7.3.msi Windows 7


...And if you have a better spellchecker in mind, please tell me about it, I will test it out :)

Upvotes: 1

Views: 1248

Answers (1)

Eric MSFT
Eric MSFT

Reputation: 3276

You are getting tripped up on the fact that there are two types of strings in Python: byte strings and Unicode strings you need a 'u' in front of the string for it to be a Unicode string:

word = u"häßlich"

Also häßlich is the old spelling of hässlich (the latter is in the dictionary and will be returned as a suggestion). You can add häßlich to your personal list of correctly spelled words if you want it to be considered correctly spelled.

enchantdict.add(word)

Upvotes: 2

Related Questions