Reputation: 9112
I have an architecture which is basically a queue with url addresses and some classes to process the content of those url addresses. At the moment the code works good, but it is slow to sequentially pull a url out of the queue, send it to the correspondent class, download the url content and finally process it.
It would be faster and make proper use of resources if for example it could read n
urls out of the queue and then shoot n
processes or threads to handle the downloading and processing.
I would appreciate if you could help me with these:
Upvotes: 1
Views: 420
Reputation: 628
You might want to look into the Python Multiprocessing library. With multiprocessing.pool
, you can give it a function and an array, and it will call the function with each value of the array in parallel, using as many or as few processes as you specify.
Upvotes: 2
Reputation: 3524
If C-calls are slow, like downloading, database requests, other IO - You can use just threading.Thread
If python code is slow, like frameworks, your logic, not accelerated parsers - You need to use multiprocessing Pool or Process. Also it speedups python code, but it is less tread-save and need to deep understanding how it works in complex code (locks, semaphores).
Upvotes: 1