Object detection where the object occupies a small part of the image

Question

I trained a road sign detection network. In the training data, the sign occupies the entire frame, like so:

However in the images which I want to use for predictions, road signs occupy a much smaller space, for example:

Predictions for such images are not very good, however if I crop to just the sign the predictions are fine.

How do I go about generating predictions for larger images?

I haven't been able to find an answer in similar questions unfortunately.

dfcastap · Accepted Answer

It sounds like you're trying to solve a different kind of problem when you want to extend your classification of individual signs to "detecting" them and classifying them inside a larger image.

You have (at least) a couple of options:

Create sliding-window that sweeps the image and makes a classification of each step. In this way when you hit the sign it will return a good classification. But you'll quickly realize that this is not very practical or efficient. The window size and stepping size become more parameters to optimize and as you'll see in the following option, there are object-detection specific methods that already try to solve this specific problem.
You can try an object detection architecture. This will require you to come up with a training dataset that's different from the one you used in your image classification. You'll need many (hundreds or thousands) of the "large" version of your image that contain (and in some cases doesn't contain) the signs you want to identify. You'll need a annotation tool to locate and label those signs and then you can train a network to locate and label them.

Some of the architectures to look up for that second option include: YOLO, Single Shot Detection (SSD), Faster RCNN, to name a few.

Object detection where the object occupies a small part of the image

Answers (1)

Related Questions