Joel
Joel

Reputation: 8978

429 RESOURCE_EXHAUSTED for Claude Sonnet 3.5 on Vertex AI

I'm trying to test the Anthropic Claude models in Google Vertex AI, but I get 429 errors. I haven't been able to get a single request through, so I don't think the issue is overusage at least.

Is this expected or should I be able to make a few requests at least?

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @request.json \
"https://europe-west1-aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/europe-west1/publishers/anthropic/models/claude-3-5-sonnet@20240620:streamRawPredict"

Gives this error:

[{
  "error": {
    "code": 429,
    "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: anthropic-claude-3-5-sonnet. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.",
    "status": "RESOURCE_EXHAUSTED"
  }
}
]%

I've been following this guide: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude

Upvotes: 1

Views: 650

Answers (1)

McMaco
McMaco

Reputation: 180

For Claude models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.

The default quota limit and supported context length for Claude 3.5 Sonnet v2 are in this image below:

image

If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.

Upvotes: 0

Related Questions