Framefact
Framefact

Reputation: 11

Openvino set_input_tensor() must be called on a function with exactly one parameter

RuntimeError: 'inputs.size() == 1' when setting input tensor for OpenVINO model with multiple inputs

I'm trying to use an OpenVINO model that was originally designed for PyTorch, and I'm running into an issue when setting the input tensors.

I have a CLIP-like model that takes both images and text as inputs and returns multiple output tensors (e.g., image embeddings and text embeddings). I'm using ov::InferRequest::set_input_tensor to set the inputs, but I keep encountering the following error:

from openvino.runtime import Core, Tensor

def toto(model, processor, image1, image2, image3, text1, text2, text3, is_openvino=False): inputs = processor(text=[text1, text2, text3], images=[image1, image2, image3], return_tensors="pt", padding=True)

if is_openvino:
    # Convert image inputs to OpenVINO tensors
    image_inputs = inputs["pixel_values"].numpy()

    # Process image inputs
    image_embeddings = []
    for image in image_inputs:
        image_tensor = Tensor(image)
        infer_request = model.create_infer_request()
        infer_request.set_input_tensor(image_tensor)
        infer_request.infer()
        
        # Assume the first output tensor corresponds to image embeddings
        image_embeddings.append(infer_request.get_output_tensor(0))

    # Convert text inputs to OpenVINO tensors
    text_inputs = inputs["input_ids"].numpy()
    
    # Process text inputs
    text_embeddings = []
    for text in text_inputs:
        text_tensor = Tensor(text)
        infer_request.set_input_tensor(text_tensor)
        infer_request.infer()
        
        # Assume the second output tensor corresponds to text embeddings
        text_embeddings.append(infer_request.get_output_tensor(1))

    # Assuming the embeddings are returned in the correct order
    x1 = image_embeddings[0]
    x2 = image_embeddings[1]
    x3 = image_embeddings[2]
else:
    # For PyTorch model
    outputs = model(**inputs)
    x1 = outputs.image_embeds[0]
    x2 = outputs.image_embeds[1]
    x3 = outputs.image_embeds[2]

sim_x1_x2 = torch.nn.functional.cosine_similarity(torch.tensor(x1), torch.tensor(x2), dim=0)
sim_x1_x3 = torch.nn.functional.cosine_similarity(torch.tensor(x1), torch.tensor(x3), dim=0)

return sim_x1_x2 > sim_x1_x3, I have try to use 3 get_infer_output but failed

Upvotes: 1

Views: 69

Answers (0)

Related Questions