Reputation: 179
I have a reasonably large file (~4gb on disk) that I want to access with Python's mmap
module to gain some familiarity with memory maps. I have a 64 bit system, and am running something similar to the example below. When I run that, I notice that this process's memory consumption continually increases. I've profiled it with pympler
and nothing stands out. Can someone point me to some resources that might describe what's going on under the hood and how to correct this (so I can scan through the file without this "memory leak" consuming all my memory)? Thanks!
import mmap
with open("/path/to/large.file", "r") as j:
mm = mmap.mmap(j.fileno(), 0, access=mmap.ACCESS_READ)
pos = 0
for i in range(mm.size()):
new_pos = mm.find(b"10", pos)
print(new_pos)
pos = new_pos + 1
EDIT The file looks something like this:
0000001, data
0000002, more data
...
...
And with this number of sequential values in the first position there will be a lot of hits for find(b"10")
Upvotes: 1
Views: 582
Reputation: 1069
Gather a live core of your process and use chap (open source software available at https://github.com/vmware/chap) to analyze that core.
Here are some commands that are relevant here for this use case:
describe used
This will describe all the used allocations (either allocated by python or by native code), but won't tell you directly about any regions that have been mmapped.
describe free
This will show allocations that have been freed but for which the associated space has not been given back to the operating system.
describe writable
describe readonly
These will tell you about larger regions, ones that are writable or readonly, respectively. In your case, where you specified ACCESS.READ for the mmapped allocation, that allocation, if still present would be seen as an unknown region in part of the output of "describe readonly" or part of such a region.
Upvotes: 1