Reputation: 2956
I'm currently learning Python (from a Java background), and I have a question about something I would have used threads for in Java.
My program will use workers to read from some web-service some data periodically. Each worker will call on the web-service at various times periodically.
From what I have read, it's preferable to use the multiprocessing
module and set up the workers as independent processes that get on with their data-gathering tasks. On Java I would have done something conceptually similar, but using threads. While it appears I can use threads in Python, I'll lose out on multi-cpu utilisation.
Here's the guts of my question: The web-service is throttled, viz., the workers must not call on it more than x times per second. What is the best way for the workers to check on whether they may request data?
I'm confused as to whether this should be achieved using:
nmap
, to share some data/value between the processes that describes if they may call the web-service.Manager()
object that monitors the calls per seconds and informs workers if they have permission to make their calls.Of course, I guess this may come down to how I keep track of the calls per second. I suppose one option would be for the workers to call a function on some other object, which makes the call to the web-service and records the current number of calls/sec. Another option would be for the function that calls the web-service to live within each worker, and for them to message a managing object every time they make a call to the web-service.
Thoughts welcome!
Upvotes: 13
Views: 687
Reputation: 39344
I think that you'll find that the multiprocessing
module will provide you with some fairly familiar constructs.
You might find that multiprocessing.Queue
is useful for connecting your worker threads back to a managing thread that could provide monitoring or throttling.
Upvotes: 2
Reputation: 9953
Not really an answer to your question, but an alternative approach to your problem: You could get rid of synchronization issues when doing requests event driven, e.g. by using the Python async module or Twisted. You wouldn't benefit from multiple CPUs/cores, but in context of network communication that's usually negligible.
Upvotes: 0
Reputation: 798386
Delegate the retrieval to a separate process which queues the requests until it is their turn.
Upvotes: 2