charjabug
charjabug

Reputation: 38

How to copy a large portion of a raw filesystem to a file?

I'm working with an arcane data collection filesystem. It's got a block describing the files and their exact offsets on disk, so I know each files' start byte, end byte and length in bytes. The goal is to grab one file from the physical disk. They're big files so performance is paramount.

Here's what "works," but very inefficiently:

import shutil, io
def start_copy(startpos, endpos, filename="C:\\out.bin"):
    with open(r"\\.\PhysicalDrive1", 'rb') as src_f:
        src_f.seek(startpos)
        flength = endpos - startpos
        print("Starting copy of "+filename+" ("+str(flength)+"B)")
        with open(filename, 'wb') as dst_f:
            shutil.copyfileobj( io.BytesIO(src_f.read(flength)), dst_f )
        print("Finished copy of "+filename)

This is slow: io.BytesIO(src_f.read(flength)) technically works, but it reads the entire file into memory before writing to the destination file. So it takes much longer than it should.

Copying directly using dst_f won't work. (I assume) the end position can't be specified, so the copy doesn't stop.

Here are some questions, each of which could be a solution to this:

Upvotes: 1

Views: 909

Answers (1)

abarnert
abarnert

Reputation: 365717

The obvious way to do this is to just write to the file.

The whole point of copyfileobj is that it buffers the data for you. If you have to read the whole file into a BytesIO, you're just buffering the BytesIO, which is pointless.

So, just loop around reading a decent-sized buffer from src_f and write it to dst_f until you reach flength bytes.

If you look at the shutil source (which is linked from the shutil docs), there's no magic inside copyfileobj; it's a trivial function. As of 3.6 (and I think it's been completely unchanged since shutil was added somewhere around 2.1…), it looks like this:

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

You can do the same thing, just keeping track of bytes read and stopping at flength:

def copypartialfileobj(fsrc, fdst, size, length=16*1024):
    """copy size bytes from file-like object fsrc to file-like object fdst"""
    written = 0
    while written < size:
        buf = fsrc.read(min(length, size - written))
        if not buf:
            break
        fdst.write(buf)
        written += len(buf)

Upvotes: 2

Related Questions