Why am I getting different results when I use models with the same weights in different formats - \(.pt) \.onnx \(.bin, .xml)?

Question

I have a model trained on YOLOv5s and is working fine.

This is an input image:

I can get an expected result using pytorch after doing an inference:

This is an output image:

The thing is, I need it in Openvino and regardless if I do the inference using the model in .onnx or .bin and .xml (for openvino) I won't get the expected inference result.

What I get is a vector with this shape (1, 25200, 6). I know that:

25200 is equal to 1x3x80x80 + 1x3x40x40 + 1x3x20x20;
6 = 1 class + 4 (x,y,w,h) + 1 (score);
batch_size = 1

To export it, I used:

!python export.py --data models/custom_yolov5s.yaml --weights /content/bucket_11_03_2022.pt --batch-size 1 --device cpu --include openvino --imgsz 640

and to reproduce the issue I did in two ways:

.onnx:

import cv2
image = cv2.imread('data/cropped.png')

# Resize image to meet network expected input sizes
resized_image = cv2.resize(image, (640, 640))

# Reshape to network input shape
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

import onnxruntime as onnxrt

onnx_session= onnxrt.InferenceSession("models/bucket_11_03_2022.onnx")
onnx_inputs= {onnx_session.get_inputs()[0].name:input_image.astype(np.float32)}
onnx_output = onnx_session.run(None, onnx_inputs)
img_label = onnx_output[0]
print(onnx_output[0].shape)

Openvino:

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.inference_engine import IECore

ie = IECore()

net = ie.read_network(
    model="bucket_11_03_2022.xml",
    weights="bucket_11_03_2022.bin",
)
exec_net = ie.load_network(net, "CPU")

output_layer_ir = next(iter(exec_net.outputs))
input_layer_ir = next(iter(exec_net.input_info))

# Text detection models expects image in BGR format
image = cv2.imread("data/cropped.png")

# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = net.input_info[input_layer_ir].tensor_desc.dims

# Resize image to meet network expected input sizes
resized_image = cv2.resize(image, (W, H))

# Reshape to network input shape
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));

result = exec_net.infer(inputs={input_layer_ir: input_image})

result['output'].shape

Could you guys help me to get the correct inference (bounding box with score) using .onnx or the IE format (openvino - .bin, .xml)?

The model files are here.

Why am I getting different results when I use models with the same weights in different formats - \(.pt) \.onnx \(.bin, .xml)?

Answers (1)

Related Questions