user10332687
user10332687

Reputation:

Read first N mb of a file

I am looking to get the first N mb of a file. Here is a basic implementation:

def get_first_n_mb(self, file=None, n=5):
    """
    Will return the first 5 (or N) MB of the passed file
    """
    file = file or self.file

    with open(file, 'rb') as fp:
        file_data = self.file_first_n_mb = fp.read(1e6 * n)

    return file_data

However, the user may pass a large number, such as n = 1000, in which case we would want to chunk the read. What would be a good 'size' to do the chunk, or would the above approach still work? How could it be improved?

Upvotes: 0

Views: 596

Answers (1)

Barmar
Barmar

Reputation: 781096

read() is permitted to return less than the amount you asked for. You should call it in a loop until you reach the amount requested or EOF. You need to keep reducing the amount you need to read by the size of the last read.

def get_first_n_mb(self, file=None, n=5):
    file = file or self.file
    amt = 1e6 * n
    file_data = ''
    with open(file, 'rb') as fp:
        while amt > 0:
            try:
                block = fp.read(amt)
                file_data += block
                amt -= len(block)
            except EOFError:
                break
    return file_data

For ordinary files read() will normally return as much as you request, as long as the file is that long. But other types of streams will often return less (e.g. reading from a terminal will usually just return one line).

Upvotes: 1

Related Questions