How can I [increase the rate limits/ batch requests] for the Google Vertex AI Bison API?

Question

I'm testing out google palm API to recursively summarize a long text, and have since come into rate-limiting issues and therefore some questions to verify on.

It seems that the number of requests made to the bison API is 60/min (this seems quite low).

Is there a way to batch the requests made to the bison API? And will that allow me to make more inference per second?
Is there a way to increase the rate limits? 60/min seems too low and not fit for production usage.

Thanks!

I tried looking into these documents:

1.Rate limit documents: Table for rate limits

2.Increasing rate limits but it seems like it's not meant for the bison model

How can I [increase the rate limits/ batch requests] for the Google Vertex AI Bison API?

Answers (1)

Related Questions