Reputation: 1659
I want to run real time object detection using YOLOv5 on a camera and then generate vector embeddings for cropped images of detected objects.
I currently generate image embeddings using this function below for locally saved images:
def generate_img_embedding(img_file_path):
images = [
Image.open(img_file_path)
]
# Encoding a single image takes ~20 ms
embeddings = embedding_model.encode(img_str)
return embeddings
also I start the Yolov5 objection detection with image cropping as follows
def start_camera(productid):
print("Attempting to start camera")
# productid = "11011"
try:
command = " python ./yolov5/detect.py --source 0 --save-crop --name "+ id +" --project ./cropped_images"
os.system(command)
print("Camera runnning")
except Exception as e:
print("error starting camera!", e)
How can I modify the YOLOv5 model to pass the cropped images into my embedding function in real time?
Upvotes: 1
Views: 2223
Reputation: 492
Just take a look at the detect.py
supplied with yolov5, the file you are running. The implementation is pretty short (~150 SLOC), I would recommend re-implementing it or modifying for your use case.
Key points, omitting a lot of (important, but standard and easily understandable) data transforms and parameter parsing, are as follows:
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data)
# Code selecting FP16/FP32 omitted here
model.warmup(imgsz=(1 if pt else bs, 3, *imgsz), half=half)
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
# Image transforms omitted
pred = model(im, augment=augment, visualize=visualize) # stage 1
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) # stage 2
for i, det in enumerate(pred):
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(im.shape[2:], det[:, :4], im0.shape).round()
# --> This is where you would access detections in real time! <--
Most of the code's logic is handling the I/O (in particular, dataset loading is handled by either LoadStreams
or LoadImages
from yolov5's utils
), the rest is just rescaling input images, loading a torch model, and running detection and NMS. No rocket science here.
The least effort path for you would be just copying the entire thing and implementing your embeddings under
for *xyxy, conf, cls in reversed(det):
Instead of saving to file, you would get (x, y, w, h) and crop the image using e.g. Pillow's Image.crop()
or slice the numpy array directly. Whichever works for you depends on the implementation of your embedding_model.encode
.
Upvotes: 1