kemri
kemri

Reputation: 179

Python mmap "memory leak"

I have a reasonably large file (~4gb on disk) that I want to access with Python's mmap module to gain some familiarity with memory maps. I have a 64 bit system, and am running something similar to the example below. When I run that, I notice that this process's memory consumption continually increases. I've profiled it with pympler and nothing stands out. Can someone point me to some resources that might describe what's going on under the hood and how to correct this (so I can scan through the file without this "memory leak" consuming all my memory)? Thanks!

import mmap                                                                                                                                                                                                                                  

with open("/path/to/large.file", "r") as j:
    mm = mmap.mmap(j.fileno(), 0, access=mmap.ACCESS_READ)

pos = 0
for i in range(mm.size()):
    new_pos = mm.find(b"10", pos)
    print(new_pos)
    pos = new_pos + 1

EDIT The file looks something like this:

0000001, data
0000002, more data
...
...

And with this number of sequential values in the first position there will be a lot of hits for find(b"10")

Upvotes: 1

Views: 582

Answers (1)

Tim Boddy
Tim Boddy

Reputation: 1069

Gather a live core of your process and use chap (open source software available at https://github.com/vmware/chap) to analyze that core.

Here are some commands that are relevant here for this use case:

describe used

This will describe all the used allocations (either allocated by python or by native code), but won't tell you directly about any regions that have been mmapped.

describe free

This will show allocations that have been freed but for which the associated space has not been given back to the operating system.

describe writable
describe readonly

These will tell you about larger regions, ones that are writable or readonly, respectively. In your case, where you specified ACCESS.READ for the mmapped allocation, that allocation, if still present would be seen as an unknown region in part of the output of "describe readonly" or part of such a region.

Upvotes: 1

Related Questions