passing multiple objects for parallel execution using ray.serve

Question

I am following this tutorial in the ray documentation

Specifically to this part:

client = serve.start()
config = {"num_replicas": 3}
client.create_backend("tf:v1", TFMnistModel, TRAINED_MODEL_PATH,config=config)
client.create_endpoint("tf_classifier", backend="tf:v1", route="/mnist")

The below sends a single sample to the backend

sample_data= np.random.randn(28 * 28).tolist()
resp = requests.get(
    "http://localhost:8000/mnist",
    json={"array": sample_data})

How can I send multiple samples all at the same time so that they are executed in parallel making use of all the cores? For example 100 samples of MNIST created using the following

# 100 MNIST sample 28x28
sample_data = np.random.randn(100 * 28 * 28).reshape((100, 28, 28))

Archit Kulkarni · Accepted Answer

The requests.get() call is blocking, so you're right that we shouldn't just call it 100 times in a for loop.

To send multiple samples in parallel over HTTP, you'll want to have several connections. The following code sample using asyncio and aiohttp shows one way to accomplish this: https://gist.github.com/architkulkarni/0bd0a92c3195c58ec460a5a0e5eb0e88#file-benchmark-py (you'll need to edit url and add the JSON input to session.get() to match your example)

Another way is to skip HTTP and just do this from Python using Ray Serve's ServeHandle API.

handle = client.get_handle("tf_classifier")
futures = [handle.remote({"array": np.random.randn(28 * 28)}) for i in range(100)]
results = ray.get(futures)

passing multiple objects for parallel execution using ray.serve

Answers (1)

Related Questions