Reputation: 30687
I'm trying to learn how to use the Ray API and comparing with my code for joblib. However, I don't know how to effectively use this (my machine has 16 CPU).
Am I doing something incorrectly? If not, why is Ray so much slower?
import ray
from joblib import Parallel, delayed
num_cpus = 16
@ray.remote(num_cpus=num_cpus)
def square(x):
return x * x
def square2(x):
return x * x
Ray:
%%time
# Launch parallel square tasks.
futures = [square.remote(x=i) for i in range(1000)]
# Retrieve results.
print(len(ray.get(futures)))
# CPU times: user 310 ms, sys: 79.7 ms, total: 390 ms
# Wall time: 612 ms
Joblib:
%%time
futures = Parallel(n_jobs=num_cpus)(delayed(square2)(x=i) for i in range(1000))
print(len(futures))
# CPU times: user 92.5 ms, sys: 21.4 ms, total: 114 ms
# Wall time: 106 ms
Upvotes: 0
Views: 1966
Reputation: 254
The Ray scheduler decides how many Ray tasks run concurrently based on their num_cpus
value (along with other resource types for more advanced use cases). By default, this value is set to 1, meaning that you can run parallel tasks up to the total number of cores. By setting it to 16, you are telling Ray that each task requires all 16 CPUs to run, so essentially you are running the square
tasks sequentially. Try running it again with just a plain @ray.remote
!
You may also want to warm up Ray by running a few times within the same script, since there is some cost from process startup at the beginning.
Finally, in an actual workload, you would probably want to do more than multiply two integers together. Each task will finish nearly instantaneously, so you will see more overhead than benefit from the extra work of distributing the execution. There's some good info on this anti-design pattern here.
Upvotes: 4