Kong
Kong

Reputation: 2422

Some simple questions regarding the training of CNNs

I read that when using CNNs, we should have approximately equal number of samples per class. I am doing binary classification, detecting pedestrians from background so the 2 classes are pedestrian and background (anything not pedestrian really).

If I were to incorporate hard negative mining in my training, I would end up with more negative samples than positive if I am getting a lot of false positives.

1) Would this be okay?

2) If not then how do I solve this issue?

3) And what are the consequences of training a CNN with more negative than positive samples?

4) If it is okay to have more negative than positive samples, is there a maximum limit that I should not exceed ? Like for eg. I should not have 3x more negative samples than positives.

5) I can augment my positive samples by jittering but how much additional samples per image should I create? Is there a 'too much'? Like if I start off with 2000 positive samples, how much additional samples is too much? Is generating a total of 100k samples from the 2k samples via jittering too much?

Upvotes: 1

Views: 55

Answers (1)

Marcin Możejko
Marcin Możejko

Reputation: 40516

It depends on which cost function you use but if you set it to be a log_loss then I can show you how intuitively not balanced dataset may harm your training and what are the possible solutions for this problem:

a. If you don't change the distribution of your classes and leave them unbalanced then - if your model is able to achieve relatively small value of a loss function then it will not only be a good detector of a pedestrian on an image but also it will learn that pedestrian detection is a relatively rare event and it may prevent you from a lot of false positives. So if you are able to spend a lot more time on training a bigger model - it may bring you a really good results.

b. If you change the distribution of your classes - then you could probably achieve relatively good results with much smaller model in shorter time - but on the other hand - because of the fact that your classifier will learn different distribution - you may achieve a lot of False positives.

But - if the training phase of your classifier is not lasting too long - you may find a good compromise between these two methods. You may set a multiplication factor (e.g. if you will increase the number of samples by 2, 3 or n times) as meta parameter and optimise the value of it e.g. using grid search schema.

Upvotes: 1

Related Questions