Reputation: 649
My small Python script uses a library to work on some relatively large data. The standard algorithm for this task is a dynamic programming algorithm, so presumably the library "under the hood" allocates a large array to keep track of the partial results of the DP. Indeed, when I try to give it fairly large input, it immediately gives a MemoryError
.
Preferably without digging into the depths of the library, I want to figure out if it is worth trying this algorithm on a different machine with more memory, or trying to trim down a bit on my input size, or if it's a lost cause for the data size I am trying to use.
When my Python code throws a MemoryError
, is there a "top-down" way for me to investigate what the size of memory was that my code tried to allocate which caused the error, e.g. by inspecting the error object?
Upvotes: 17
Views: 1379
Reputation: 516
You can see the memory allocation with Pyampler but you will need to add the debugging statements locally in the library that you are using. Assuming a standard PyPi package, here are the steps:
2 Use summary module of Pyampler. Place following inside the main recursion method,
from pympler import summary
def data_intensive_method(data_xyz)
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
...
pip install -e .
to install the edited package locally.Upvotes: 4
Reputation: 1123420
You can't see from the MemoryError
exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.
You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.
That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc
module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.
The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot()
, compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc
is your best shot to discover it.
Upvotes: 4
Reputation: 789
It appears that MemoryError
is not created with any associated data:
def crash():
x = 32 * 10 ** 9
return 'a' * x
try:
crash()
except MemoryError as e:
print(vars(e)) # prints: {}
This makes sense - how could it if no memory is left?
I don't think there's an easy way out. You can start from the traceback that the MemoryError
causes and investigate with a debugger or use a memory profiler like pympler (or psutil as suggested in the comments).
Upvotes: 2