Reputation: 31
I would like to train a yolo model to detect traffic signs. I have very large images (3840 x 2160) of whole street scenes, containing 1-3 smaller traffic signs. I know the model works best when its trained on squared and small images. So what would you think would be the better practice to train my model with my given images? Resize the whole street scene and train on the whole image or extracting the traffic signs out of the street scene and only train on the signs?
Thanks for your help.
Upvotes: 0
Views: 3100
Reputation: 6494
I would first try resizing your images to a more suitable size. It's likely you don't need the full resolution for your model to perform well enough for your use case. For sign detection with YOLOv5 specifically, I've seen 416x416
be sufficient.
If that doesn't work in your case, you can also try tiling your images and doing inference on one small chunk at a time.
There's an article on other strategies to detect small objects here.
Upvotes: 1
Reputation: 15003
You are accidentally intertwining two different deep learning approaches: when you extract only the traffic signs and you train on those extractions, you are actually performing classification and not object detection.
However, Yolo is a model that is suitable for object detection, which does include both classification and localisation.
My recommendation is to try smaller image sizes and try to keep the aspect ratio(i.e. divide both the width and height by the same number), thus train on whole images.
As for the squared approach, you will have to test it on yourself to check how the model behaves.
Upvotes: 2