Faultier
Faultier

Reputation: 1326

python readlines() does not contain whole file

I have an auto-generated info file coming from a measurement. It consists of both binary as well as human readable parts. I want to extract some of the non binary meta data. For some files, I am not able to get to the meta data, as the readlines() does not yield the whole file. I guess that the file contains some EOF char. I can open the file in notepad++ without problems.

A possible solution to this problem would be to read in the file binary and parse it to char afterwards, deleting the EOF char while doing so. Anyhow, I wonder if there is a more elegant way to do so?

Edit: The question was rightfully downvoted, I should have provided code. I actually use

f = open(fname, 'r')
raw = f.readlines()

and then proceed with walking through the list. The EOF chars that are existing (depending on the OS) seem to cause the havoc I am observing. I will accept the answer that states using the binary 'rb' flag. By the way, this was an impressive response time! (-:

Upvotes: 0

Views: 4572

Answers (2)

Joran Beasley
Joran Beasley

Reputation: 114068

with open(afile,"rb") as f: print f.readlines()

What's the problem with doing this?

If you don't open the file in binary mode some non ASCII characters are incorrectly interpreted and or discarded... Which may inadvertently also remove some ASCII if it is mixed in with binary data

Upvotes: 5

user2443147
user2443147

Reputation:

You can use the read() function of the file object. It reads the whole file.

with open('input.bin', 'r') as f:
    content = f.read()

Then you can parse the content. If you know where the part you need starts, you can seek to it (e.g. if the file has a fixed-length binary start):

with open('input.bin', 'r') as f:
    f.seek(CONTENT_START)
    content = f.read()

On Windows, you should change the reading mode to 'rb', to indicate that you want to read the file in binary mode; only then line endings in the text-part may consist of '\r\n', depending on how you created the file in the first place.

Upvotes: 0

Related Questions