Overview We use the Language Tasks with PaLM API Firebase Extension and we're finding that the output field for a generated response is truncated. Example Send a prompt (through the prompt field in a Cloud Firestore document in the "generate" collection) to PaLM that asks for suggested brand guidelines. status.state is "COMPLETED", no errors The output is truncated at ~4500 characters Some Things We've Looked Into There isn't anything in the docs that states that output has a cap The Firestore document is well under the 1MiB document size limit Question Is there some hard limit on the length of the generated output? If so, what is that and where can we find out more details about this?

Reputation: 1882

Language Tasks with PaLM API `output` Truncated

Overview

We use the Language Tasks with PaLM API Firebase Extension and we're finding that the output field for a generated response is truncated.

Example

Send a prompt (through the prompt field in a Cloud Firestore document in the "generate" collection) to PaLM that asks for suggested brand guidelines.
status.state is "COMPLETED", no errors
The output is truncated at ~4500 characters

Some Things We've Looked Into

There isn't anything in the docs that states that output has a cap
The Firestore document is well under the 1MiB document size limit

Question

Is there some hard limit on the length of the generated output? If so, what is that and where can we find out more details about this?

Upvotes: 1

Answers (1)

Mark McDonald

Reputation: 8190

I assume the extension you linked to doesn't impose any output limits, but the underlying models have finite generation capabilities.

e.g. text-bison-001 has an output limit of 1,024 tokens (ref)

You can query the API to find out the limits of the model you're using:

>>> import google.generativeai as palm
>>> palm.get_model('models/text-bison-001').output_token_limit
1024

The max_output_tokens API setting can be used to control the output size, but only up to the output_token_limit, not beyond.

You can usually use prompt engineering to work around the limitation though, especially given the input token limit is much higher than the output limit. e.g.

First prompt:

You are a document-writing bot that produces detailed documentation on apple harvesting machines.

Please write the instruction manual for the ApplePicker-2000, the world's fastest harvester that works via sub-quantum wormhole generation.

Generate the introductory paragraph for the device:

Next prompt:

You are a document-writing bot that produces detailed documentation on apple harvesting machines.

Please write the instruction manual for the ApplePicker-2000, the world's fastest harvester that works via sub-quantum wormhole generation.

Here is the previous section:
<previous output>

Please write the next paragraph of the manual: