Reputation: 1
We changed to a hosted web based system which produces UTF-8 encoded files, however we have legacy applications that require ANSI cp1252 encoded files. Converting is not a problem however special french characters in names get munged in the translation into 2 bytes. Not surprising but users are insisting that the french characters be retained.
I wrote a Python program to translate the file as follows:
import io
src_path='0189enr.asc'
dst_path='a-new-file2.txt'
outcontent=""
changes=0
with io.open(src_path, mode="r", encoding="utf8") as fd:
content = fd.read()
for char in content:
if char == 'è':
outchar=138
outcontent=outcontent+chr(outchar)
if char == 'ô':
outchar=147
outcontent=outcontent+chr(outchar)
if char == 'é':
outchar=130
outcontent=outcontent+chr(outchar)
else:
outcontent=outcontent+char
with io.open(dst_path, mode="w", encoding="cp1252") as fd:
fd.write(outcontent)
UnicodeEncodeError: 'charmap' codec can't encode character '\x82' in position 300132: character maps to
I'm stuck - how can I modify these specific characters and produced an cp1252 encoded file? Any help would be appreciated thanks!
Upvotes: -1
Views: 434
Reputation: 119847
A correct utf8-to-cp1252 translator in Python looks like this:
with io.open(src_path, mode="r", encoding="utf8") as fd:
content = fd.read()
with io.open(dst_path, mode="w", encoding="cp1252") as fd:
fd.write(content)
Note how the entire middle section of your code is just gone.
A standard utility like iconv
will do the same thing better.
Upvotes: 3