Reputation: 20762
When a text file is open for reading using (say) UTF-8 encoding, is it possible to change encoding during the reading?
Motivation: It hapens that you need to read a text file that was written using non-default encoding. The text format may contain the information about the used encoding. Let an HTML file be the example, or XML, or ASCIIDOC, and many others. In such cases, the lines above the encoding information are allowed to contain only ASCII or some default encoding.
In Python, it is possible to read the file in binary mode, and translate the lines of bytes
type to str
on your own. When the information about the encoding is found on some line, you just switch the encoding to be used when converting the lines to unicode strings.
In Python 3, text files are implemented using TextIOBase
that defines also the encoding
attribute, the buffer
, and other things.
Is there any nice way to change the encoding information (used for decoding the bytes
) so that the next lines would be decoded in the wanted way?
Upvotes: 0
Views: 294
Reputation: 22942
Classic usage is:
Then:
See this example: Detect character encoding in an XML file (Python recipe) note: the code is a little old, but useful.
Upvotes: 1