Reputation:
I am looking to get the first N mb of a file. Here is a basic implementation:
def get_first_n_mb(self, file=None, n=5):
"""
Will return the first 5 (or N) MB of the passed file
"""
file = file or self.file
with open(file, 'rb') as fp:
file_data = self.file_first_n_mb = fp.read(1e6 * n)
return file_data
However, the user may pass a large number, such as n = 1000, in which case we would want to chunk the read. What would be a good 'size' to do the chunk, or would the above approach still work? How could it be improved?
Upvotes: 0
Views: 596
Reputation: 781096
read()
is permitted to return less than the amount you asked for. You should call it in a loop until you reach the amount requested or EOF. You need to keep reducing the amount you need to read by the size of the last read.
def get_first_n_mb(self, file=None, n=5):
file = file or self.file
amt = 1e6 * n
file_data = ''
with open(file, 'rb') as fp:
while amt > 0:
try:
block = fp.read(amt)
file_data += block
amt -= len(block)
except EOFError:
break
return file_data
For ordinary files read()
will normally return as much as you request, as long as the file is that long. But other types of streams will often return less (e.g. reading from a terminal will usually just return one line).
Upvotes: 1