How to change encoding of characters from file

Question

I have been reading quite a bit about encoding, and I'm still not sure I'm fully wrapping my head around it. I have a file encoded as ANSI with the word "Solluções" in it. I want to convert the file to UTF-8, but whenever I do it changes the characters.

Code:

with codecs.open(filename_in,'r') 
   as input_file, 
   codecs.open(filename_out,'w','utf-8') as output_file:
   output_file.write(input_file.read())

Result: "SolluÃ§Ãµes"

I imagine this is a stupid problem, but I am at an impasse at the moment. I tried to call encode('utf-8') on the individual data in the file prior to writing it to no avail, so I'm guessing that's not correct either... I appreciate any help, thank you!

Josh Durham · Accepted Answer

This SO answer to a similar question specifies the input type of the file like codecs.open(sourceFileName, "r", "your-source-encoding"). Without that, python may not interpret the characters correctly if it can't detect the original encoding.

Warning about the encodings: Most people talking about ANSI refer to one of the Windows codepages; you may really have a file in CP (codepage) 1252, which is almost, but not quite the same thing as ISO-8859-1 (Latin 1). If so, use cp-1252 instead of latin-1 as your-source-encoding.

How to change encoding of characters from file

Answers (2)

Related Questions