Reputation: 57
I'm running a malware analysis experiment in Python and I need to create a big object (512 MB, I think). While testing locally (64-bit system), there is no problem, but when I try to run it on a remote 32-bit system (so the process has a stack of max 4 GB), I get a MemoryError (the stack trace doesn't give much information). The big allocation is:
from sklearn.grid_search import GridSearchCV
...
model = GridSearchCV(svm.LinearSVC(), {'C':numpy.logspace(-3,3,7)})
model.fit(train_vectors, labels)
I asked the sysadmin for the system and he tells me it's probably the previous allocations that have fragmented the heap, so that the big allocation is no longer possible.
I've tried to run gc.collect() right before the call that causes the big allocation, but the problem persist.
I don't think there's a way to make the big allocation smaller.
Any suggestions on how I could defragment the heap?
Edit: I'm managed to make the training vectors a lot smaller. Now I need to see if the malware detection technique still works. If it does, my problem should be solved. The reason was that the vectors were numpy arrays and just using the function tolist() made them a lot smaller.
Edit 2: Just using lists (of floats) wasn't enough. Because the values were integer anyway, I casted the floats to ints, making the vectors a little smaller. This had a big impact on the memory usage. I was saving the vectors using cPickle and retrieving them using the same module. I'm guessing there's a bug in that module somewhere that causes a memory when loading floats, that is not present for ints.
TL;DR: I didn't find a way to defragment the heap, but was able to locate the problem using memory-profiler (which I liked best) and heapy. I solved the problem by making the vectors smaller and changing the data type to ints (instead of floats). I suspect the cPickle module I used to store/load the vectors has a memory leak when using floats, which is why I ran out of memory.
Upvotes: 2
Views: 1268
Reputation: 912
You should use a memory profiler.
From Which Python memory profiler is recommended? :
or even the objgraph library
Upvotes: 1