Reputation: 4434
I would like to use python
multiple threading
capability for my app, but had some performance issue (my guess). The site is hosted on GAE
and it talks to a REST
server based on EC2
to do some calculations. The REST
server is powered by bottlepy
.
My question is:
On the GAE side, I have a loop which calls the REST server multiple times to do the calculation. To improve performance, I use threading
library. But I found some of the calculations are missing. Usually, I do not have this issue if only twenty jobs are fired, but I do have this issue when 200 jobs are fired. I appreciate any suggestions.
Here is my code:
def my_function():
...
response = urlfetch.fetch(url=url, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
#In this loop, I use the Thread to enable multiple threading
def loop_fun():
for i in range(100):
p=Thread(target = my_function)
all_threads.append(p)
# Start all threads
[x.start() for x in all_threads]
# Wait for all of them to finish
[x.join() for x in all_threads]
Below is the error message for one job (usually I receive sever this type error message):
Exception in thread Thread-12:
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\dist27\threading.py", line 569, in __bootstrap_inner
self.run()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\dist27\threading.py", line 522, in run
self.__target(*self.__args, **self.__kwargs)
File "D:\Dropbox\ubertool_src\genee\genee_model.py", line 102, in __init__
response = urlfetch.fetch(url=url, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 270, in fetch
return rpc.get_result()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\apiproxy_stub_map.py", line 612, in get_result
return self.__get_result_hook(self)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 403, in _get_fetch_result
raise DownloadError("Unable to fetch URL: " + url + error_detail)
DownloadError: Unable to fetch URL: http://url_20140122160100678000 Error: [Errno 10061] No connection could be made because the target machine actively refused it
Upvotes: 0
Views: 117
Reputation: 4132
If the problem is one of overload, this problem might benefit from a "pool of workers" strategy.
import threading
import Queue
def worker( jobs ):
while True:
url = jobs.get()
if url is None:
break
# do stuff with the URL
if __name__ == '__main__':
thread_count = 30
job_q = Queue.Queue()
pool = [ threading.Thread(target=worker,args=(job_q,))
for i in range(thread_count) ]
for p in pool:
p.start()
for url in urls_to_get:
job_q.put(url)
# Signal each thread that there are no more jobs.
for p in pool:
job_q.put(None)
for p in pool:
p.join()
This way, you can control how many simultaneous requests are taking place by limiting the quantity of threads.
FYI: Python is not really good at threading (depending on the interpreter). Some interpreters have a Global Interpreter Lock that prevent multiple threads from running at once. Threading works OK for I/O bound tasks, but not for making efficient use of the CPU. For simultaneity, use multiprocessing. The changes to my (untested) sample code above would be to use multiprocessing
instead of threading
and create a Process
instead of a Thread
.
Upvotes: 1