Niraj Agrawal
Niraj Agrawal

Reputation: 11

Why I am getting this error "google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit"?

I am new to google cloud platform. I have created a endpoint after uploading a model on google Vertex AI. But when I am running the prediction function (python) suggested in the sample request I am getting this error :-

Traceback (most recent call last):
  File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 67, in 
error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Users\My\anaconda3\lib\site-packages\grpc\_channel.py", line 826, in 
_end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.RESOURCE_EXHAUSTED
    details = "received trailing metadata size exceeds limit"
    debug_error_string = "{"created":"@1622724354.768000000","description":"Error received 
from peer ipv4:***.***.***.**","file":"src/core/lib/surface/call.cc", 
"file_line":1063,"grpc_message":"received trailing metadata size exceeds limit", 
"grpc_status":8}">

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "b.py", line 39, in <module>
    predict_custom_trained_model_sample(
  File "b.py", line 28, in predict_custom_trained_model_sample
    response = client.predict(
  File "C:\Users\My\anaconda3\lib\site- 
   packages\google\cloud\aiplatform_v1\services\prediction_service\client.py", line 445, in 
predict
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\gapic_v1\method.py", line 145, 
in __call__
    return wrapped_func(*args, **kwargs)
  File "C:\Users\My\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 69, in 
error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ResourceExhausted: 429 received trailing metadata size exceeds limit

the code that I executed for prediction is

from typing import Dict
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value


def predict_custom_trained_model_sample(
    project: str,
    endpoint_id: str,
    instance_dict: Dict,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for 
multiple requests.
    client = 
aiplatform.gapic.PredictionServiceClient(client_options=client_options)
    # The format of each instance should conform to the deployed model's prediction input schema.
    instance = json_format.ParseDict(instance_dict, Value())
    instances = [instance]
    parameters_dict = {}
    parameters = json_format.ParseDict(parameters_dict, Value())
    endpoint = client.endpoint_path(
        project=project, location=location, endpoint=endpoint_id
    )
    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters
    )
    print("response")
    print(" deployed_model_id:", response.deployed_model_id)
    # The predictions are a google.protobuf.Value representation of the model's predictions.
    predictions = response.predictions
    for prediction in predictions:
        print(" prediction:", dict(prediction)) 

After running this code I got the error. If anyone knows about this issue pls help.

Upvotes: 1

Views: 3568

Answers (1)

Tsvi Sabo
Tsvi Sabo

Reputation: 675

Few things to consider:

  1. Profile your custom container model, make sure it's predict api function isn't for some reason latent
  2. Allow your prediction service to serve using multiple workers
  3. Increase your number of replicas in Vertex or set your machine types to stronger types as long as you gain improvement

However, there's something worth doing first in the client side assuming most of your prediction calls go through successfully and it is not that frequent that the service is unavailable,

Configure your prediction client to use Retry (exponential backoff):

from google.api_core.retry import Retry, if_exception_type
import requests.exceptions
from google.auth import exceptions as auth_exceptions
from google.api_core import exceptions

if_error_retriable = if_exception_type(
exceptions.GatewayTimeout,
exceptions.TooManyRequests,
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable,
exceptions.DeadlineExceeded,
requests.exceptions.ConnectionError,  # The last three might be an overkill
requests.exceptions.ChunkedEncodingError,
auth_exceptions.TransportError,
)


def _get_retry_arg(settings: PredictionClientSettings):
return Retry(
    predicate=if_error_retriable,
    initial=1.0, # Initial delay
    maximum=4.0, # Maximum delay
    multiplier=2.0, # Delay's multiplier
    deadline=9.0, # After 9 secs it won't try again and it will throw an exception
)

def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
...
    response = await client.predict(
    endpoint=endpoint,
    instances=instances,
    parameters=parameters,
    timeout=SOME_VALUE_IN_SEC,
    retry=_get_retry_arg(),
)

Upvotes: 0

Related Questions