Is it possible to train model from multiple datasets for each class?

I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project and model_main.py to train my model.

I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.

Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)

If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !

Thanks !

Upvotes: 0

Views: 1995

Answers (1)

netanel-sam
netanel-sam

Reputation: 1912

The short answer is yes, it might be problematic, but with some effort you can make it possible. If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example. The two possible outcomes I can think of are:

  1. The model will not converge, since it tries to learn opposite things.
  2. The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second. In fact I tried doing so myself in a different setup, and got this outcome.

In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.

For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.

Upvotes: 1

Related Questions