Mees de Vries
Mees de Vries

Reputation: 649

Can I find out the allocation request that caused my Python MemoryError?

Context

My small Python script uses a library to work on some relatively large data. The standard algorithm for this task is a dynamic programming algorithm, so presumably the library "under the hood" allocates a large array to keep track of the partial results of the DP. Indeed, when I try to give it fairly large input, it immediately gives a MemoryError.

Preferably without digging into the depths of the library, I want to figure out if it is worth trying this algorithm on a different machine with more memory, or trying to trim down a bit on my input size, or if it's a lost cause for the data size I am trying to use.

Question

When my Python code throws a MemoryError, is there a "top-down" way for me to investigate what the size of memory was that my code tried to allocate which caused the error, e.g. by inspecting the error object?

Upvotes: 17

Views: 1379

Answers (3)

amirathi
amirathi

Reputation: 516

You can see the memory allocation with Pyampler but you will need to add the debugging statements locally in the library that you are using. Assuming a standard PyPi package, here are the steps:

  1. Clone the package locally.

2 Use summary module of Pyampler. Place following inside the main recursion method,

   from pympler import summary
   def data_intensive_method(data_xyz)
       sum1 = summary.summarize(all_objects)
       summary.print_(sum1)
       ...
  1. Run pip install -e . to install the edited package locally.
  2. Run your main program and check the console for memory usage at each iteration.

Upvotes: 4

Martijn Pieters
Martijn Pieters

Reputation: 1123420

You can't see from the MemoryError exception, and the exception is raised for any situation where memory allocation failed, including Python internals that do not directly connect to code creating new Python data structures; some modules create locks or other support objects and those operations can fail due to memory having run out.

You also can't necessarily know how much memory would be required to have the whole operation succeed. If the library creates several data structures over the course of operation, trying to allocate memory for a string used as a dictionary key could be the last straw, or it could be copying the whole existing data structure for mutation, or anything in between, but this doesn't say anything about how much memory is going to be needed, in addition, for the remainder of the process.

That said, Python can give you detailed information on what memory allocations are being made, and when, and where, using the tracemalloc module. Using that module and an experimental approach, you could estimate how much memory your data set would require to complete.

The trick is to find data sets for which the process can be completed. You'd want to find data sets of different sizes, and you can then measure how much memory those data structures require. You'd create snapshots before and after with tracemalloc.take_snapshot(), compare differences and statistics between the snapshots for those data sets, and perhaps you can extrapolate from that information how much more memory your larger data set would need. It depends, of course, on the nature of the operation and the datasets, but if there is any kind of pattern tracemalloc is your best shot to discover it.

Upvotes: 4

roeen30
roeen30

Reputation: 789

It appears that MemoryError is not created with any associated data:

def crash():
    x = 32 * 10 ** 9
    return 'a' * x

try:
    crash()
except MemoryError as e:
    print(vars(e))  # prints: {}

This makes sense - how could it if no memory is left?

I don't think there's an easy way out. You can start from the traceback that the MemoryError causes and investigate with a debugger or use a memory profiler like pympler (or psutil as suggested in the comments).

Upvotes: 2

Related Questions