Reputation: 40140
with open(sourceFileName, 'rt') as sourceFile:
sourceFileConents = sourceFile.read()
sourceFileConentsLength = len(sourceFileConents)
i = 0
while i < sourceFileConentsLength:
print(str(i) + ' ' + sourceFileConents[i])
i += 1
Please forgive the unPythonic for i
loop, this is only the test code & there are reasons to do it that way in the real code.
Anyhoo, the real code seemed to be ending the loop sooner than expected, so I knocked up the dummy above, which removes all of the logic of the real code.
The sourceFileConentsLength
reports as 13,690, but when I print it out char for char, there are still a few 100 chars more in the file, which are not being printed out.
What gives?
<fileHandle>.read()
to get the file's entire contents into a single string? [Update] I think that we strike two of those ideas.
For maximum string length, see this question.
I did an ls -lAF
to a temp directory. Only 6k+ chars, but the script handed it just fine. Should I be worrying about line endings? If so, what can I do about it? The source files tend to get edited under both Windows & Linux, but the script will only run under Linux.
[Updfate++] I changed the line endings on my input file to Linux in Eclipse, but still got the same result.
Upvotes: 4
Views: 90
Reputation: 2114
If your file is encoded in something like UTF-8
, you should decode it before counting the characters:
sourceFileContents_utf8 = open(sourceFileName, 'r+').read()
sourceFileContents_unicode = sourceFileContents_utf8.decode('utf8')
print(len(sourceFileContents_unicode))
i = 0
source_file_contents_length = len(sourceFileContents_unicode)
while i < source_file_contents_length:
print('%s %s' % (str(i), sourceFileContents[i]))
i += 1
Upvotes: 1
Reputation: 56634
If you read a file in text mode it will automatically convert line endings like \r\n
to \n
.
Try using
with open(sourceFileName, newline='') as sourceFile:
instead; this will turn off newline-translation (\r\n
will be returned as \r\n
).
Upvotes: 2