O.rka
O.rka

Reputation: 30687

How to effectively use parallelization with Ray in Python?

I'm trying to learn how to use the Ray API and comparing with my code for joblib. However, I don't know how to effectively use this (my machine has 16 CPU).

Am I doing something incorrectly? If not, why is Ray so much slower?

import ray 
from joblib import Parallel, delayed

num_cpus = 16

@ray.remote(num_cpus=num_cpus)
def square(x):
    return x * x

def square2(x):
    return x * x

Ray:

%%time
# Launch parallel square tasks.
futures = [square.remote(x=i) for i in range(1000)]

# Retrieve results.
print(len(ray.get(futures)))
# CPU times: user 310 ms, sys: 79.7 ms, total: 390 ms
# Wall time: 612 ms

Joblib:

%%time
futures = Parallel(n_jobs=num_cpus)(delayed(square2)(x=i) for i in range(1000))
print(len(futures))
# CPU times: user 92.5 ms, sys: 21.4 ms, total: 114 ms
# Wall time: 106 ms

Upvotes: 0

Views: 1966

Answers (1)

Stephanie Wang
Stephanie Wang

Reputation: 254

The Ray scheduler decides how many Ray tasks run concurrently based on their num_cpus value (along with other resource types for more advanced use cases). By default, this value is set to 1, meaning that you can run parallel tasks up to the total number of cores. By setting it to 16, you are telling Ray that each task requires all 16 CPUs to run, so essentially you are running the square tasks sequentially. Try running it again with just a plain @ray.remote!

You may also want to warm up Ray by running a few times within the same script, since there is some cost from process startup at the beginning.

Finally, in an actual workload, you would probably want to do more than multiply two integers together. Each task will finish nearly instantaneously, so you will see more overhead than benefit from the extra work of distributing the execution. There's some good info on this anti-design pattern here.

Upvotes: 4

Related Questions