arun
arun

Reputation: 11023

How to run ray correctly?

Trying to understand how to correctly program with ray.

The results below do not seem to agree with the performance improvement of ray as explained here.

Environment:

Here are the machine specs:

>>> import psutil
>>> psutil.cpu_count(logical=False)
4
>>> psutil.cpu_count(logical=True)
8
>>> mem = psutil.virtual_memory()
>>> mem.total
33707012096 # 32 GB

First, the traditional python multiprocessing with Queue (multiproc_function.py):

import time
from multiprocessing import Process, Queue

N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)

def loop(n, nums, q):
    print(f"n = {n}")
    s = 0
    start = time.perf_counter()
    for e in nums:
        s += e
    t_taken = round(time.perf_counter() - start, 2)
    q.put((n, s, t_taken))

if __name__ == '__main__':
    results = []

    nums = list(range(N_LIST_ITEMS))

    q = Queue()

    procs = []
    for i in range(N_PARALLEL):
        procs.append(Process(target=loop, args=(i, nums, q)))

    for proc in procs:
        proc.start()

    for proc in procs:
        n, s, t_taken = q.get()
        results.append((n, s, t_taken))

    for proc in procs:
        proc.join()

    for r in results:
        print(r)

The results are:

$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)

real    0m19.156s
user    1m13.614s
sys     0m24.496s

When inspecting htop during the run, the memory went from 2.6 GB base consumption to 8 GB, and had all the 8 processors fully consumed. Also, it is clear from user+sys > real that parallel processing is happening.

Here is the ray test code (ray_test.py):

import time
import psutil
import ray

N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)

use_logical_cores = False
num_cpus = psutil.cpu_count(logical=use_logical_cores)
if use_logical_cores:
    print(f"Setting num_cpus to # logical cores  = {num_cpus}")
else:
    print(f"Setting num_cpus to # physical cores = {num_cpus}")
ray.init(num_cpus=num_cpus)

@ray.remote
def loop(nums, n):
    print(f"n = {n}")
    s = 0
    start = time.perf_counter()
    for e in nums:
        s += e
    t_taken = round(time.perf_counter() - start, 2)
    return (n, s, t_taken)

if __name__ == '__main__':
    nums = list(range(N_LIST_ITEMS))
    list_id = ray.put(nums)
    results = ray.get([loop.remote(list_id, i) for i in range(N_PARALLEL)])
    for r in results:
        print(r)

Results are:

$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)

real    0m45.082s
user    0m22.163s
sys     0m10.213s

The real time is much longer than that of python multiprocessing. Also, real is greater than user+sys. When inspecting htop, the memory went up to 30 GB and the cores were also not fully saturated. All these seem to contradict what ray is supposed to do.

Then I set use_logical_cores to True. The run gets killed due to out of memory:

$ time python ray_test.py
Setting num_cpus to # logical cores  = 8
2020-04-28 16:27:43,709 INFO resource_spec.py:205 -- Starting Ray with 17.29 GiB memory available for workers and up to 8.65 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
Killed

real    0m25.205s
user    0m15.056s
sys     0m4.028s

Am I doing something wrong here?

Upvotes: 2

Views: 2831

Answers (1)

Sang
Sang

Reputation: 925

Firstly, Ray doesn't guarantee the CPU affinity or resource isolation. That could be the reason why it have non-saturated CPU usage. (I am not 100% sure though). You can try setting cpu affinity using psutil and see if the cores are still not saturated. (https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_affinity).

About the result, would you mind trying the newest version of Ray? There was pretty good progress in performance & memory management in Ray from the version 0.7.4.

Upvotes: 1

Related Questions