user246821
user246821

Reputation: 1

Converting Special Characters in UTF-8 file to cp1252 file in Python

We changed to a hosted web based system which produces UTF-8 encoded files, however we have legacy applications that require ANSI cp1252 encoded files. Converting is not a problem however special french characters in names get munged in the translation into 2 bytes. Not surprising but users are insisting that the french characters be retained.

I wrote a Python program to translate the file as follows:

import io

src_path='0189enr.asc'
dst_path='a-new-file2.txt'
outcontent=""
changes=0

with io.open(src_path, mode="r", encoding="utf8") as fd:
    content = fd.read()

for char in content:
    if char == 'è':
        outchar=138
        outcontent=outcontent+chr(outchar)
    if char == 'ô':
        outchar=147
        outcontent=outcontent+chr(outchar)
    if char == 'é':
        outchar=130
        outcontent=outcontent+chr(outchar)
    else:
        outcontent=outcontent+char

with io.open(dst_path, mode="w", encoding="cp1252") as fd:
    fd.write(outcontent)

However the program is failing on the fd.write() with the error:

UnicodeEncodeError: 'charmap' codec can't encode character '\x82' in position 300132: character maps to

I'm stuck - how can I modify these specific characters and produced an cp1252 encoded file? Any help would be appreciated thanks!

Upvotes: -1

Views: 434

Answers (1)

n. m. could be an AI
n. m. could be an AI

Reputation: 119847

A correct utf8-to-cp1252 translator in Python looks like this:

with io.open(src_path, mode="r", encoding="utf8") as fd:
    content = fd.read()

with io.open(dst_path, mode="w", encoding="cp1252") as fd:
    fd.write(content)

Note how the entire middle section of your code is just gone.

A standard utility like iconv will do the same thing better.

Upvotes: 3

Related Questions