How to control memory usage in multithreading?

Question

I am using multi thread to process image.

It works fine on my computer that has enough memory (increase 2~3 GB when processing many images), but my server only has 1GB memory and the code not work properly.

Sometimes end with Segmentation fault, sometimes:

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "passportRecognizeNew.py", line 267, in doSomething
  ...

Code:

import threading

def doSomething(image):
    # picture processing code
    print("processing over")

threads = []

for i in range(20):
    thread = threading.Thread(target=doSomething, args=("image",))
    threads.append(thread)

for t in threads:
    t.setDaemon(True)
    t.start()

t.join()

print("All over")

How to solve this or any way to control memory usage?

GhostCat · Accepted Answer

I think you are looking at this from the wrong angle. Your code fires up n threads. Those threads then execute work that you defined for them.

If that work requires them to allocate a lot of memory - what should anything "outside" of that context do about this? What should happen? Should some of the threads be killed? Should somewhere, deep down in C code a malloc ... not happen ... and then?

What I am saying is: your problem is most likely, that you are simply firing up too many of those threads.

Thus the answer is: don't try to fix things after you broke them - better make sure you do not break them at all:

do careful profiling, to understand your application; so you can asses how much memory a single thread requires to get its "work" done
then change your "main" program to query the hardware it is running on (like: check for available memory and number of physical CPUs that are available)
and based on that assessment, start that number of threads that should work given the aforementioned hardware details

Beyond that: this is very common pattern. The developer has a "powerful" machine he is working on; and he implicitly assumes that any target system running his product will have the same or better characteristics. And that is simply not true.

In other words: when you don't know how the hardware looks like your code is running on - then there is only one reasonable thing to do: first acquire that knowledge. To afterwards do different things, based on real data.

How to control memory usage in multithreading?

Answers (2)

Related Questions