codie
codie

Reputation: 353

Why is mmap in python so slow?

I've got a 1GB binary example file which I load into memory. When running a benchmark on python 3.7 and windows, mmap loses severly in terms of performance against readinto. The following code runs a benchmark. The first routine uses simple readinto, to read the first N bytes of a file into a buffer whereas the second routine uses mmap to just pull N bytes into memory and read it too.

import numpy as np
import time
import mmap
import os
import matplotlib.pyplot as plt

def mmap_perf():

    filepath = "test.bin"
    filesize = os.path.getsize(filepath)

    MEGABYTES = 10**6
    batch_size = 10 * MEGABYTES

    mview = memoryview(bytearray(filesize))

    batch_sizes = []
    load_durations = []
    for i_part in range(1, filesize // batch_size):

        start_time = time.time()
        with open(filepath, "br") as fp:
            # start = i_part * batch_size
            fp.seek(0)
            fp.readinto(mview[0:batch_size * i_part])
        duration_readinto = time.time() - start_time

        start_time = time.time()
        with open(filepath, "br") as fp:
            length = (i_part * batch_size // mmap.ALLOCATIONGRANULARITY + 1) * \
                mmap.ALLOCATIONGRANULARITY
            with mmap.mmap(fp.fileno(),
                           offset=0,
                           length=length,
                           access=mmap.ACCESS_READ) as mp:
                mview[0:i_part * batch_size] = mp[0:i_part * batch_size]
        duration_mmap = time.time() - start_time

        msg = "{2}MB\nreadinto: {0:.4f}\nmmap:     {1:.4f}"
        print(msg.format(duration_readinto, duration_mmap, i_part * batch_size // MEGABYTES))

        batch_sizes.append(batch_size * i_part // MEGABYTES)
        load_durations.append((duration_readinto, duration_mmap))

    load_durations = np.asarray(load_durations)
    plt.plot(batch_sizes, load_durations)
    plt.show()

The plot looks as follows:

enter image description here

I just can't understand how mmap loses entirely, even when loading small batches of just 10MB from a 1GB file.

Upvotes: 0

Views: 1660

Answers (1)

Tong Zhang
Tong Zhang

Reputation: 11

For such sequential read workload, readinto through system call can largely benefit from OS prefetch, while you may have to set MAP_POPULATE for mmap to enjoy the same benefit. If you test random read workload, you will see totally different comparison.

Upvotes: 1

Related Questions