Nick Mint
Nick Mint

Reputation: 145

Problem returning values from a Ray.remote parallel Python 3 function

I have been working on an EC2 parallel cloud application using Ray for setting up the cluster and scheduling the tasks. However, I have a problem that has been perplexing me. The following is a very simplified program (running on 3 workers) that illustrates it:-

import numpy as np
import subprocess as sp
import boto3
import ray

redadd=sp.check_output("hostname -I",shell=True).decode("utf-8").rstrip()
ray.init(redis_address=redadd+":6379")
pop=np.ones((3,3))

@ray.remote
def test_loop(n):                                           
    return n*pop[n,:]

for i in range(0,2): 
    print("iteration ",i)
    print(pop)
    if __name__=='__main__':
        ans=ray.get([test_loop.remote(n) for n in range(0,3)])
    print("ans ",ans)
    pop=2*pop

ray.shutdown()

The output of this is:-

2019-07-03 23:35:06,078 WARNING worker.py:1337 -- WARNING: Not updating   worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
iteration  0
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
ans  [array([0., 0., 0.]), array([1., 1., 1.]), array([2., 2., 2.])]
iteration  1
[[2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]]
ans  [array([0., 0., 0.]), array([1., 1., 1.]), array([2., 2., 2.])]

Ignoring the warning, the puzzle is that the value of pop is read during the first iteration of the test_loop, returning the three product vectors in parallel. However, on the next iteration, where pop's value has been doubled, the test_loop ignores it and retains the old value. Can anyone explain what is going on here, and how to get the remote function call to work as I would expect?

N.B. I don't think that this is a scope problem: pop is globally defined and is not re-assigned in test_loop.

Upvotes: 1

Views: 1485

Answers (1)

Robert Nishihara
Robert Nishihara

Reputation: 3362

Each Ray "worker" runs in a separate process (as opposed to a thread), so there aren't any globally scoped variables that are shared between all of the workers.

When you define the test_loop remote function, the function definition is serialized and shipped to each worker process (along with the pop array). So each worker process (in addition to your main script) has its own copy of pop. When you modify pop in the main script, that doesn't affect the other copies of the pop array.

If you want your worker processes to have state that gets mutated when methods run, you may want to use Ray actors.

Upvotes: 1

Related Questions