Jonathan Vanasco
Jonathan Vanasco

Reputation: 15680

python - profile the memory cost of all imports?

I'm not quite sure if this is possible or not...

I have a large Python application that grows to a large memory size. I am hoping to track the process growth by import statement to minimize that if possible.

The closest that I've found is the line profiling feature of memory_profiler. That will only profile the memory of a "toplevel" import statement though - and I want a breakdown of all the subordinate imports. I haven't found any profiler that can track memory size to the import statement.

This is a concern not just to optimize our own code, but because a recent audit showed some PyPi modules enabling 3rd party framework support by simply dropping the import statement in a try/except block.

for example, one library did this:

try:
    import bottle
    # declare bottle support here
except:
    pass

While my app is deployed in a virtualenv, there are several other sibling services that are part of the deployment and run in the same virtualenv... one of which uses bottle.

This "pattern" is used in a handful of libraries I use, and the overhead of unwanted/unneeded modules is at decent amount of this application's memory imprint (based on manually isolating and measuring them). I would like to figure out which libraries to prioritize patching and which ones I can safely ignore.

Upvotes: 3

Views: 360

Answers (1)

Jonathan Vanasco
Jonathan Vanasco

Reputation: 15680

After not having much luck, I had a wacky idea and it somewhat works.

I overrode the import statement to calculate the current memory of a given process before and after every import. I don't think this covers every import situation, but it's a good start. I simply printed this, then copy/pasted it onto a file, and then did some quick preprocessing to turn it into a csv that tracks the index and percent growth/total of each call. that's enough for my current needs.

import os
import psutil
import __builtin__
this_process = psutil.Process(os.getpid())
realimport = __builtin__.__import__
def myimp(name, *args, **kwargs):
    try:
        _mem_start = this_process.get_memory_info()[0]
        r = realimport(name, *args, **kwargs)
        _mem_finish = this_process.get_memory_info()[0]
        _mem_growth = _mem_finish - _mem_start
        print "import|%s,%s,%s,%s" % (name, _mem_growth, _mem_start, _mem_finish)
        return r
    except:
        raise
__builtin__.__import__ = myimp

There are better ways to do the above, and I still hope there are better ways to profile an app like this. For now, I've got a working solution.

Upvotes: 1

Related Questions