Reputation: 73
I am using the following demo code:
from PIL import Image
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
image = Image.open("frame0.jpg")
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
print(probs)
which is outputting great similarity scores (i.e. tensor([[0.9848, 0.0152]], grad_fn=<SoftmaxBackward0>)
)
what I want to do is instead of using the image provided, use a vector I have pre-calculated with vit-large-patch14. However, I cant workout how to supply my own vector into this codebase.
My vector is of the form frame_vector = [0.1512, -0.0351....{768 elements}]
Upvotes: 0
Views: 20