Reputation: 17
I am using command R+ model on azure. I know that Command R+ supports a context length of 128K. Now I want to know the max token limit for the output response for command R+.
So I can pass the max_tokens limit value accordingly.
Upvotes: 0
Views: 666
Reputation: 3721
The Cohere Command R+ model on Azure supports a context length of up to 128,000 tokens. For generating responses, the maximum number of tokens that can be produced is also governed by this overall context limit.
The specific output token limit for responses is not explicitly capped by a fixed number but is generally determined by the max_tokens
parameter you set in your request. This parameter dictates how many tokens you want the model to generate as part of the response. Given the context length of 128,000 tokens, you can allocate a portion of this for the output while considering the length of your input prompt.
For practical purposes, and to ensure efficient usage, it's common to set the max_tokens
parameter to a value that ensures your input and output together do not exceed the total context length. For instance, if your input prompt is expected to use 1,000 tokens, you could set max_tokens
to 127,000 tokens, though in real-world applications, it might be more typical to set it to something like 2,000 to 4,000 tokens for a single response to manage performance and response time effectively.
Here is how you might set it in your API call:
import cohere
co = cohere.Client('your-api-key')
response = co.generate(
model='command-r-plus',
prompt='Your prompt text goes here',
max_tokens=2000, # Adjust this value based on your needs and the model's limits
temperature=0.5
)
print('Generated text:', response.generations[0].text)
Ensure you monitor and adjust the max_tokens
setting based on your application's requirements and the complexity of the tasks at hand.
References:
Upvotes: 1