Reputation: 2422
I read that when using CNNs, we should have approximately equal number of samples per class. I am doing binary classification, detecting pedestrians from background so the 2 classes are pedestrian and background (anything not pedestrian really).
If I were to incorporate hard negative mining in my training, I would end up with more negative samples than positive if I am getting a lot of false positives.
1) Would this be okay?
2) If not then how do I solve this issue?
3) And what are the consequences of training a CNN with more negative than positive samples?
4) If it is okay to have more negative than positive samples, is there a maximum limit that I should not exceed ? Like for eg. I should not have 3x more negative samples than positives.
5) I can augment my positive samples by jittering but how much additional samples per image should I create? Is there a 'too much'? Like if I start off with 2000 positive samples, how much additional samples is too much? Is generating a total of 100k samples from the 2k samples via jittering too much?
Upvotes: 1
Views: 55
Reputation: 40516
It depends on which cost function you use but if you set it to be a log_loss
then I can show you how intuitively not balanced dataset may harm your training and what are the possible solutions for this problem:
a. If you don't change the distribution of your classes and leave them unbalanced then - if your model is able to achieve relatively small value of a loss function then it will not only be a good detector of a pedestrian on an image but also it will learn that pedestrian detection is a relatively rare event and it may prevent you from a lot of false positives. So if you are able to spend a lot more time on training a bigger model - it may bring you a really good results.
b. If you change the distribution of your classes - then you could probably achieve relatively good results with much smaller model in shorter time - but on the other hand - because of the fact that your classifier will learn different distribution - you may achieve a lot of False positives.
But - if the training phase of your classifier is not lasting too long - you may find a good compromise between these two methods. You may set a multiplication factor (e.g. if you will increase the number of samples by 2, 3 or n times) as meta parameter and optimise the value of it e.g. using grid search schema.
Upvotes: 1