Matheus Torquato
Matheus Torquato

Reputation: 1629

GCP Gemini API - Send multimodal prompt requests using local image

On this page Google shows a sample code on how to send multimodal prompt requests (image + text).

    import vertexai
    
    from vertexai.generative_models import GenerativeModel, Part
    
    # TODO(developer): Update and un-comment below line
    # project_id = "PROJECT_ID"
    
    vertexai.init(project=project_id, location="us-central1")
    
    model = GenerativeModel(model_name="gemini-1.5-flash-001")
    
    image_file = Part.from_uri(
        "gs://cloud-samples-data/generative-ai/image/scones.jpg", "image/jpeg"
    )
    
    # Query the model
    response = model.generate_content([image_file, "what is this image?"])
    print(response.text)

It works fine.

What I would like to do is to perform the same task but with an image loaded locally. Something like this:

    from PIL import Image

    image_part = Part.from_image(Image.load_from_file("image.jpg"))
    response = model.generate_content([image_part,"what is this image?"])

as written in the docstring of class Part at vertexai/generative_models/_generative_models.py, but this throws this exception:

    module 'PIL.Image' has no attribute 'load_from_file'

Is there any alternative for Part.from_uri for local images?

Upvotes: 1

Views: 1696

Answers (1)

Matheus Torquato
Matheus Torquato

Reputation: 1629

It turns out that image_part = Part.from_image(Image.load_from_file("image.jpg")) works. The problem with the sample above is that Image should be imported from vertexai.generative_models and not from PIL.

The code below works as expected.

import vertexai

from vertexai.generative_models import GenerativeModel, Part, Image

model_id: str = "gemini-1.5-pro-preview-0409"
project_id: str = "YOUR_GCP_PROJECT"
region: str = "YOUR_GCP_REGION"

vertexai.init(project=project_id, location=region)

model: GenerativeModel = GenerativeModel(model_name=model_id)

response = model.generate_content([
    Part.from_image(Image.load_from_file('image.jpg')),
    "What is shown in this image?",])

print(response.text)

Upvotes: 3

Related Questions