Reputation: 1652
I am following this tutorial in the ray documentation
Specifically to this part:
client = serve.start()
config = {"num_replicas": 3}
client.create_backend("tf:v1", TFMnistModel, TRAINED_MODEL_PATH,config=config)
client.create_endpoint("tf_classifier", backend="tf:v1", route="/mnist")
The below sends a single sample to the backend
sample_data= np.random.randn(28 * 28).tolist()
resp = requests.get(
"http://localhost:8000/mnist",
json={"array": sample_data})
How can I send multiple samples all at the same time so that they are executed in parallel making use of all the cores? For example 100 samples of MNIST created using the following
# 100 MNIST sample 28x28
sample_data = np.random.randn(100 * 28 * 28).reshape((100, 28, 28))
Upvotes: 1
Views: 731
Reputation: 185
The requests.get()
call is blocking, so you're right that we shouldn't just call it 100 times in a for loop.
To send multiple samples in parallel over HTTP, you'll want to have several connections. The following code sample using asyncio
and aiohttp
shows one way to accomplish this: https://gist.github.com/architkulkarni/0bd0a92c3195c58ec460a5a0e5eb0e88#file-benchmark-py (you'll need to edit url
and add the JSON input to session.get()
to match your example)
Another way is to skip HTTP and just do this from Python using Ray Serve's ServeHandle API.
handle = client.get_handle("tf_classifier")
futures = [handle.remote({"array": np.random.randn(28 * 28)}) for i in range(100)]
results = ray.get(futures)
Upvotes: 6