Reputation: 7154
I'd like dump hunspell's pl_PL dictionary.
I found the solution: unmunch /usr/share/hunspell/pl_PL.dic /usr/share/hunspell/pl_PL.aff
But there's problem with encoding.
Part of the output:
ambasadorowaniom
ambasadorowaniach
ambasadorowa�
ambasadoruj�cy
ambasadoruj�cym
I've also tried filtering output with iconv, but the problem wasn't solved:
affix: z�c� 4, strip: �� 2
affix: z�ce 4, strip: �� 2
affix: z�cej 5, strip: �� 2
stable 50 num is 470 flag G
parsing line: MAP 8
parsing line: MAP a�
parsing line: MAP c�
How can i solve this problem?
Upvotes: 4
Views: 1364
Reputation: 31
iconv
solves the problem - the dictionary file seems to be encoded with iso-latin-2, and has to be converted to utf-8:
unmunch pl_PL.dic pl_PL.aff 2>/dev/null | iconv -f iso-8859-2 -t utf8
Upvotes: 2
Reputation: 31
Short version: It's a problem with your console terminal. Change it to another one like xterm.
Longer: Strange. It should be UTF8. Are you sure it is not caused by your console or terminal not supporting UTF8? Check result in any UTF8 capable graphic editor. And check your LOCALE settings.
Disclaimer: I want to help. But, since I cannot comment anything (1 reputation point), request clarification or sending message to user I have to provide any answer (in my Answer) to not be deleted.
Upvotes: 1