Python codecs module

Question

I am trying to load a file saved as UTF-8 into python (ver2.6.6) which contains 14 different languages. I am using the python codecs module to decode the txt file.

import codecs
f = open('C:/temp/list_test.txt', 'r')
    for lines in f:
        line=filter_str(lines.decode("utf-8")

This all works well. I parse the entire file and then want to export 14 different language files. The problem that I can't understand is the following

I use the following code for output:

malangout = codecs.open("C:/temp/'polish.txt",'w','utf-8','surrogateescape')
    for item in lang_dic['English']:
         temp = lang_dic[lang1][item]
         malangout.write(temp + '
')
    malangout.close()

Example:

Language: Polish
Expected output: Dziennik zakłóceń
Actual output: Dziennik zak‚óceƒ

The string is stored as is:

u'Dziennik zak\u201a\xf3ce\u0192'

I have tried many encoding from the python docs (7.8 codecs). Any infomation would help at this point.

Python codecs module

Answers (1)

Related Questions