Reputation: 31
I am making a face mask detection project and I trained my model using ultralytics/yolov5.I saved the trained model as an onnx file, you can find the model file here model.onnx. Now I want you use this model.onnx with opencv to detect real time face mask. The input image size during training was 320*320. You can visualize this model using netron. I have written this code to capture the image using webcam and pass it to model.onnx to predict my bounding boxes. The code is as follows:
def predict(img):
session = onnxruntime.InferenceSession(model_path)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
img = img.reshape((1,3,320,320))
data = json.dumps({'data':img.tolist()})
data = np.array(json.loads(data)['data']).astype('float32')
result = session.run([output_name],{input_name:data})
result = np.array(result)
print(result.shape)
The output of result.shape is (1, 1, 3, 40, 40, 85) Can anyone help me in interpreting this shape and how can i use this result array to predict my class, bounding box and confidence.
Upvotes: 3
Views: 3572
Reputation: 319
I've never worked with a pure yolov5 model, but here's the output format for yolov5s. It looks like it should be similar.
ouput tensor structure (yolov5s):
output_tensor[a, b, c, d]
a -> image index (If you're input is a batch of images, this tells you which image's output you're looking at. If your input is just one image, leave this as 0.)
b -> index of image in batch
c -> information about bounding box
0, 1 -> x and y coordinate of bounding box center
2, 3 -> width and height of bounding box
4 -> bounding box confidence
5 - 85 -> single class confidences
d -> index of proposed bounding boxes
Upvotes: 0