Reputation: 99
I have been looking at the different quotas for VertexAI.
I have checked the "quotas & system limits" for Vertex AI and there are thousands of quotas.
I am currently testing Vertex AI SDKs specifically Gemini and other models. I am trying things like ChatPrompts, TextPrompts, etc.
Eg.: https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts
I would like to limit the API requests per minute/day. Can someone help me understand which quotas should I limit in the "quotas & system limits" as there are thousands of quotas.
Thanks
Upvotes: 0
Views: 784
Reputation: 1
One can implement some sort of delay before/after each API request. It depends on what application/language you use.
You got the start here: https://cloud.google.com/vertex-ai/generative-ai/docs/quotas.
However, you might find it easier if you filter for base_model:gemini-pro
and your region of choice.
When you locate your item, on the far right you have the 'more actions menu' (3 vertical dots) for that item which will give you the possibility to 'Create usage alert'.
Hope this helps.
Upvotes: 0