Reputation: 5636
the main doc where this is discussed isn't exactly clear.
"Cached content can be any of the MIME types supported by Gemini multimodal models. For example, you can cache a large amount of text, audio, or video. You can specify more than one file to cache."
My thinking was that I could use the Context Cache to cache an entire prompt with multi-modal input (e.g. a list of mixed images and text) in the same way a system prompt works. Like it is just prepended before any downstream prompt i use that references the cache. For e.g., I spend a million tokens teaching Gemini to do something in a multi-modal cached prompt and it can be used repeatedly (prepended before) a much smaller prompt
However, the statement above could also be read as you can only cache specific MIME types. For e.g. instead of an entire multi-modal prompt, I can only cache the images from that prompt. If thats true, and my intention is to use the cached files in a downstream multi-modal prompt, how would you reference each image uniquely in the downstream prompt?
I realize the feature is still Pre-GA, but I hope we get some more examples in these docs
Upvotes: 0
Views: 149