Reputation: 2700
I'm using Python 2.6 to read latin2 encoded file with windows line endings ('\r\n').
import codecs
file = codecs.open('stackoverflow_secrets.txt', encoding='latin2', mode='rt')
line = file.readline()
print(repr(line))
outputs : u'login: yabcok\n'
file = codecs.open('stackoverflow_secrets.txt', encoding='latin2', mode='r')
line = file.readline()
print(repr(line))
or
file = codecs.open('stackoverflow_secrets.txt', encoding='latin2', mode='rb')
line = file.readline()
print(repr(line))
outputs : u'password: l1x1%Dm\r\n'
My questions:
codecs
module commonly used with binary files?Upvotes: 1
Views: 2611
Reputation: 536567
mode='rt'
'rt' isn't a real mode as such - that will do the same as 'r'.
Why text mode is not the default?
See Torsten's answer.
Also, if you are using anything but Windows, text mode files behave identically to binary files anyway.
You may instead be thinking of 'U'niversal newlines mode, which attempts to allow other platforms' text-mode files to work. Whilst it is possible to pass a 'U' flag to codecs.open, given the doc as outlined above I think it's bug. Certainly the results would go wrong on UTF-16 and some East Asian codecs, so don't rely on it.
Why newline chars aren't stripped from readline() output?
It is necessary to be able to tell whether the last line of the file ends with a trailing newline.
Upvotes: 0
Reputation: 86542
Are you sure that your examples are correct? The documentation of the codecs module says:
Note: Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. This means that no automatic conversion of '\n' is done on reading and writing.
On my system, with a Latin-2 encoded file + DOS line endings, there's no difference between "rt", "r" and "rb" (Disclaimer: I'm using 2.5 on Linux).
The documentation for open
also mentions no "t" flag, so that behavior seems a little strange.
Newline characters are not stripped from lines because not all lines returned by readline
may end in newlines. If the file does not end with a newline, the last line does not carry one. (I obviously can't come up with a better explanation).
Newline characters do not differ based on the encoding (at least not among the ones which use ASCII for 0-127), only based on the platform. You can specify "U" in the mode when opening the file and Python will detect any form of newline, either Windows, Mac or Unix.
Upvotes: 3