Reputation: 2635
I am using multi thread to process image.
It works fine on my computer that has enough memory (increase 2~3 GB when processing many images), but my server only has 1GB memory and the code not work properly.
Sometimes end with Segmentation fault
, sometimes:
Exception in thread Thread-13:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "passportRecognizeNew.py", line 267, in doSomething
...
Code:
import threading
def doSomething(image):
# picture processing code
print("processing over")
threads = []
for i in range(20):
thread = threading.Thread(target=doSomething, args=("image",))
threads.append(thread)
for t in threads:
t.setDaemon(True)
t.start()
t.join()
print("All over")
How to solve this or any way to control memory usage?
Upvotes: 3
Views: 4936
Reputation: 2635
With the GhostCat help, I use following code to solve memory usage problem.
import Queue
import threading
import multiprocessing
import time
import psutil
class ThreadSomething(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# check available memory
virtualMemoryInfo = psutil.virtual_memory()
availableMemory = virtualMemoryInfo.available
print(str(availableMemory/1025/1024)+"M")
if availableMemory > MEMORY_WARNING:
# image from queue
image = self.queue.get()
# do something
doSomething(image)
# signals to queue job is done
self.queue.task_done()
else:
print("memory warning!")
def doSomething(image):
# picture processing code, cost time and memory
print("processing over")
# After testing, there seems no use to create threads more than CPU_COUNT,
# execution time is not reduce.
CPU_COUNT = multiprocessing.cpu_count()
MEMORY_WARNING = 200*1024*1024 # 200M
images = ["1.png", "2.png", "3.png", "4.png", "5.png"]
queue = Queue.Queue()
def main():
# spawn a pool of threads, and pass them queue instance
for i in range(CPU_COUNT):
t = ThreadSomething(queue)
t.setDaemon(True)
t.start()
# populate queue with data
for image in images:
queue.put(image)
# wait on the queue until everything has been processed
queue.join()
start = time.time()
main()
print 'All over. Elapsed Time: %s' % (time.time() - start)
I use psutil module to get available memory.
Reference code: yosemitebandit/ibm_queue.py
The code in my question has a problem of creating threads more than CPU_COUNT
.
Upvotes: 5
Reputation: 140603
I think you are looking at this from the wrong angle. Your code fires up n threads. Those threads then execute work that you defined for them.
If that work requires them to allocate a lot of memory - what should anything "outside" of that context do about this? What should happen? Should some of the threads be killed? Should somewhere, deep down in C code a malloc
... not happen ... and then?
What I am saying is: your problem is most likely, that you are simply firing up too many of those threads.
Thus the answer is: don't try to fix things after you broke them - better make sure you do not break them at all:
Beyond that: this is very common pattern. The developer has a "powerful" machine he is working on; and he implicitly assumes that any target system running his product will have the same or better characteristics. And that is simply not true.
In other words: when you don't know how the hardware looks like your code is running on - then there is only one reasonable thing to do: first acquire that knowledge. To afterwards do different things, based on real data.
Upvotes: 5