Reputation: 261
I wrote a script that performs API calls using the python mulithreading library. It speeds up the processing by huge margins because the bottleneck is the network, not anything on my host (enter someone stating python doesn't do true multithreading here).
The issue is that sometimes when I run the script I receive this error, with my script eventually hanging/sleeping:
pthread_cond_wait: Resource busy
I have no idea how to figure out why this is happening. How do I get more context to debug the issue? Do I need to put print statements in a bunch of random places and hope to catch whatever issue is causing this? Is there a better way to debug?
If it helps, this is how I implemented the multithreading:
for i in range(threads): # make the threads
t = threading.Thread(target=queue_worker, args=[apikey, q, retries, hit_threshold]) # The threads will use the "queue_worker" function with these parameters
t.daemon = True
t.start() # start the thread!
# Data is put onto the queue and queue_worker does the API work here...
...
q.join() # Clean up and close the threads when the threads are all idle (no more data on the queue)
EDIT:
queue_worker, api and main code is basically this:
def queue_worker(apikey, q, retries, hit_threshold)
api_data = q.get()
for x in range(retries)
try:
response = do_api(api_data, apikey)
except Exception as error:
time.sleep(5)
continue
else:
error_count = error_count + 1
q.task_done()
continue
#... data parsing code here...
#... printing parsed data to screen here if a particular value returned is greater than "hit_threshold"...
q.task_done()
def do_api(api_data, apikey)
params = { 'apikey': apikey, 'resource': api_data }
response = requests.get('https://MYURL.com/api', params=params, timeout=10)
return response
if __name__ == '__main__':
threads = 50
q = Queue.Queue(threads)
for i in range(threads): # make the threads
t = threading.Thread(target=queue_worker, args=[apikey, q, retries, hit_threshold]) # The threads will use the "queue_worker" function with these parameters
t.daemon = True
t.start() # start the thread!
# Data is put onto the queue and queue_worker does the API work here...
...
q.join() # Clean up and close the threads when the threads are all idle (no more data on the queue)
Upvotes: 1
Views: 1909
Reputation: 15533
Comment: Any tips on debugging?
Locks, Condition
or other threading
Functions for nested usage. Locks
for accessing shared Variables.Read Python Threads and the Global Interpreter Lock and try this "work around".
There are other ways to accelerate the GIL manipulation or avoid it:
- call ''time.sleep()'' - set ''sys.setcheckinterval()'' - run Python in optimized mode - dump process-intensive tasks into C-extensions - use the subprocess module to execute commands
Likely, you are facing Python GIL!
what-is-a-global-interpreter-lock-gil
One of the other threads has the lock.
There is inconsistent use of locking.
Upvotes: 1