Reputation: 4619
I have to read through a plain-text (UTF-8) file line-by-line and convert it into a .tex file (just another plain-text file with markup) for processing by a LaTeX processor.
One of the things I want to do is to convert special characters like é into their LaTeX representation: \'e
So I wrote:
with open(input, "r") as in_file, open(output, "w") as out_file:
for line in in_file:
# Other code here
line.replace('é', "\\'e") # This fails as below
# Other code here
out_file.write(line)
running the script on an input file gives:
line.replace('é', "\\'e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
So clearly the interpreter is using the ascii
codec. Why?
Instead of the normal open(...)
I also tried codecs.open(input, "r", "utf-8")
and similarly for the output file, but get the same error.
Before running line.replace(...)
I also tried using each of the following lines in turn (not both together, first one, then the other) to convert line
to a unicode string:
line = unicode(line, "utf-8")
line = line.decode("utf-8")
but get exactly the same error.
What's the proper way to do it?
Update 1: I had already added # -*- coding: UTF-8 -*-
as the second line to the .py file before asking this question. Without it the interpreter would give the following error upon trying to run the script:
SyntaxError: Non-ASCII character '\xc3' in file <filename> on line 46, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Upvotes: 0
Views: 1209
Reputation: 2821
Probably a source issue. Try placing this at the top of your file:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
For more information you can look here: https://www.python.org/dev/peps/pep-0263/
Upvotes: 1