Deploying multiple models to same endpoint in Vertex AI

Question

Our use case is as follows: We have multiple custom trained models (in the hundreds, and the number increases as we allow the user of our application to create models through the UI, which we then train and deploy on the fly) and so deploying each model to a separate endpoint is expensive as Vertex AI charges per node used. Based on the documentation it seems that we can deploy models of different types to the same endpoint but I am not sure how that would work. Let's say I have 2 different custom trained models deployed using custom containers for prediction to the same endpoint. Also, say I specify the traffic split to be 50% for the two models. Now, how do I send a request to a specific model? Using the python SDK, we make calls to the endpoint, like so:

from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(endpoint_id)
prediction = endpoint.predict(instances=instances)

# where endpoint_id is the id of the endpoint and instances are the observations for which a prediction is required

My understanding is that in this scenario, vertex AI will route some calls to one model and some to the other based on the traffic split. I could use the parameters field, as specified in the docs, to specify the model and then process the request accordingly in the custom prediction container, but still some calls will end up going to a model which it will not be able to process (because Vertex AI is not going to be sending all requests to all models, otherwise the traffic split wouldn't make sense). How do I then deploy multiple models to the same endpoint and make sure that every prediction request is guaranteed to be served?

Kabilan Mohanraj · Accepted Answer

This documentation talks about a use case where 2 models are trained on the same feature set and are sharing the ingress prediction traffic. As you have understood correctly, this does not apply to models that have been trained on different feature sets, that is, different models.

Unfortunately, deploying different models to the same endpoint utilizing only one node is not possible in Vertex AI at the moment. There is an ongoing feature request that is being worked on. However, we cannot provide an exact ETA on when that feature will be available.

I reproduced the multi-model setup and noticed the below points.

Traffic Splitting

I deployed 2 different models to the same endpoint and sent predictions to it. I set a 50-50 traffic splitting rule and saw errors that implied requests being sent to the wrong model.

Cost Optimization

When multiple models are deployed to the same endpoint, they are deployed to separate, independent nodes. So, you will still be charged for each node used. Also, node autoscaling happens at the model level, not at the endpoint level.

A plausible workaround would be to pack all your models into a single container and use a custom HTTP server logic to send prediction requests to the appropriate model. This could be achieved using the parameters field of the prediction request body. The custom logic would look something like this.

@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    body = await request.json()
    parameters = body["parameters"]
    instances = body["instances"]
    inputs = np.asarray(instances)
    preprocessed_inputs = _preprocessor.preprocess(inputs)

    if(parameters["model_name"]=="random_forest"):
        print(parameters["model_name"])
        outputs = _random_forest_model.predict(preprocessed_inputs)
    else:
        print(parameters["model_name"])
        outputs = _decision_tree_model.predict(inputs)

    return {"predictions": [_class_names[class_num] for class_num in outputs]}

Deploying multiple models to same endpoint in Vertex AI

Answers (1)

Related Questions