Reputation: 764
I am using the below code for a dictionary of like 100,000 keys and values...I wanted to make it more faster by doing multiprocessing/multithreading since each loop is independent of another loop. Can anyone tell me how to apply and which one (multiprocessing/multithreading) is more apt for this kind of approach
from urlparse import urlparse
ProcessAllURLs(URLs)
ProcessAllURLs(URLs)
def ProcessAllURLs(URLs):
for eachurl in URLs:
x=urlparse(eachurl)
print eachurl.netloc
Thanks
Upvotes: 1
Views: 175
Reputation: 1858
The multiprocessing library is probably best for your example. It looks like your code could be rewritten to be:
from urlparse import urlparse
nprocs = 2 # nprocs is the number of processes to run
ParsePool = Pool(nprocs)
ParsedURLS = ParsePool.map(urlparse,URLS)
The map function is the same as the built-in map function, but runs a separate process for each function call.
See http://docs.python.org/library/multiprocessing.html for more on multiprocessing.
Upvotes: 1
Reputation: 13216
I would recommend Python's multiprocessing library. In particular, study the section labeled "Using a pool of workers". It should be pretty quick to rework the above code so that it uses all available cores of your system.
One tip, though: Don't print URLs from the pool workers. It is better to pass back the answer to the main process and aggregate them there for printing. Printing from different processes will result in a lot of jumbled, uncoordinated console output.
Upvotes: 1