Mawg
Mawg

Reputation: 40140

File contents not as long as expected

with open(sourceFileName, 'rt') as sourceFile:
    sourceFileConents = sourceFile.read()
    sourceFileConentsLength = len(sourceFileConents)

    i = 0
    while i < sourceFileConentsLength:
        print(str(i) + ' ' + sourceFileConents[i])
        i += 1

Please forgive the unPythonic for i loop, this is only the test code & there are reasons to do it that way in the real code.

Anyhoo, the real code seemed to be ending the loop sooner than expected, so I knocked up the dummy above, which removes all of the logic of the real code.

The sourceFileConentsLength reports as 13,690, but when I print it out char for char, there are still a few 100 chars more in the file, which are not being printed out.

What gives?


[Update] I think that we strike two of those ideas.

For maximum string length, see this question.

I did an ls -lAF to a temp directory. Only 6k+ chars, but the script handed it just fine. Should I be worrying about line endings? If so, what can I do about it? The source files tend to get edited under both Windows & Linux, but the script will only run under Linux.


[Updfate++] I changed the line endings on my input file to Linux in Eclipse, but still got the same result.

Upvotes: 4

Views: 90

Answers (2)

Tui Popenoe
Tui Popenoe

Reputation: 2114

If your file is encoded in something like UTF-8, you should decode it before counting the characters:

sourceFileContents_utf8 = open(sourceFileName, 'r+').read()
sourceFileContents_unicode = sourceFileContents_utf8.decode('utf8')
print(len(sourceFileContents_unicode))

i = 0
source_file_contents_length = len(sourceFileContents_unicode)
while i < source_file_contents_length:
    print('%s %s' % (str(i), sourceFileContents[i]))
    i += 1

Upvotes: 1

Hugh Bothwell
Hugh Bothwell

Reputation: 56634

If you read a file in text mode it will automatically convert line endings like \r\n to \n.

Try using

with open(sourceFileName, newline='') as sourceFile:

instead; this will turn off newline-translation (\r\n will be returned as \r\n).

Upvotes: 2

Related Questions