Reputation: 5250
I've deployed the Llama 3 model using the Deploy button on the Vertex AI model garden Llama 3 card:
https://pantheon.corp.google.com/vertex-ai/publishers/meta/model-garden/llama3
I can make a request using the "Try out Llama 3" side panel on that page & it seems to be working with my deployed model + endpoint. I'd like to try making a request using Curl or python next. The endpoint UI page also has a "sample request" feature, but it's much less helpful / very generic rather than customized.
So does anyone have an example request (for this model or another)?
Specifically for the JSON instances & parameters. Parameters I also may be able to figure out, but I have absolutely no idea what an instance is in this context? This seems like the closest related question: Sending http request Google Vertex AI end point
..Google Cloud loves naming something generically, not giving that many details on what it is, & then expecting something very specific as a value.
edit: Found the docs on this GCP method: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/predict
which gives some description but "The instances that are the input to the prediction call." is not really that helpful.
Upvotes: 1
Views: 1084
Reputation: 8399
Apologies for the poor experience. For now, the best reference is the notebook.
Here's the relevant snippet:
prompt = "What is a car?" # @param {type: "string"}
max_tokens = 50 # @param {type:"integer"}
temperature = 1.0 # @param {type:"number"}
top_p = 1.0 # @param {type:"number"}
top_k = 1.0 # @param {type:"number"}
raw_response = False # @param {type:"boolean"}
# Overides parameters for inferences.
# If you encounter the issue like `ServiceUnavailable: 503 Took too long to respond when processing`,
# you can reduce the max length, such as set max_tokens as 20.
instances = [
{
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"top_k": top_k,
"raw_response": raw_response
}
]
But please note that the full JSON (e.g. to send using curl
) is:
{
"instances": [
{
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"top_k": top_k,
"raw_response": raw_response
}
]
}
Upvotes: 2