user3057706
user3057706

Reputation: 73

CLIP computing similarities with known vector

I am using the following demo code:

from PIL import Image

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

image = Image.open("frame0.jpg")

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities
print(probs)

which is outputting great similarity scores (i.e. tensor([[0.9848, 0.0152]], grad_fn=<SoftmaxBackward0>))

what I want to do is instead of using the image provided, use a vector I have pre-calculated with vit-large-patch14. However, I cant workout how to supply my own vector into this codebase.

My vector is of the form frame_vector = [0.1512, -0.0351....{768 elements}]

Upvotes: 0

Views: 20

Answers (0)

Related Questions