Reputation: 8424
I'm trying to profile a basic function in Python to see the comparative advantage of multithreading for evaluating its results. It seems the threaded version performs increasingly worse as the size of the data across which the function is applied increases. Is there overhead in starting threads that I'm not taking into account here? Can someone explain how to actually achieve multithreaded optimization / what I'm doing wrong?
from multiprocessing import Pool
def f(x):
return x*x
pool = Pool(processes=4)
import timeit
print timeit.timeit('map(f, range(20000000))', setup = "from __main__ import f", number = 1)
print timeit.timeit('pool.map(f, range(20000000))', setup = "from __main__ import f, pool", number = 1)
Results:
5.90005707741
11.8840620518
[Finished in 18.9s]
If relevant, I ran this in Sublime Text 3.
Upvotes: 0
Views: 226
Reputation: 880
@john has already given the answer but I want to provide an example too
from multiprocessing import Pool
def f(n):#fibonacci
if n == 0:
return 0
elif n == 1:
return 1
else:
return f(n-1) + f(n-2)
pool = Pool(processes=4)
import timeit
print timeit.timeit('map(f, xrange(35))', setup = "from __main__ import f", number = 1)
print timeit.timeit('pool.map(f, xrange(35))', setup = "from __main__ import f, pool", number = 1)
Result:
4.349
2.497
Upvotes: 0
Reputation: 249093
The "unit of work" you do in each job is way too small. This is often a concern when you "map" jobs like this--the overhead of the mapping process dominates. Of course mapping a job to a separate process is more time consuming than mapping in the same process, so it is no surprise that the multiprocess solution is slower.
Try it with a function that does a lot more computation and you will see the benefits of multiprocessing.
Upvotes: 1