NASNet-A fine tuning poor validation accuracy

Question

I have a dataset of roughly 34000 images divided in 2 sets: train (30000 images) and validation (4000 images) sets. Each image is the result of the difference between two images taken from a video (the time offset between the images in each pair is about 1 second). The videos have a static background so the diff images contains too much black with only one or two small regions with colors. Each diff image has a label (there has been an action or no.. 1 or 0) so this is sort of binary classification. Briefly, I'm using the slim models pretrained on ImageNet to do the finetuning on my dataset. I've launched 5 separated training using 5 different networks: InceptionV4, InceptionResnetV2, Resnet152, NASNet-mobile, NASNet. I got very good results using the first 4 networks InceptionV4, InceptionResnetV2, Resnet152, NASNet-mobile but it was not the case using NASNet. The thing is that the Area Under the ROC curve on the validation set is always = 0.5 and the logits of the validation images are roughly having the same values which is really weird. In fact, I got this kind of results using NASNet-mobile on the first 10000 mini-batch but after that the model did converge. Here are the values of the hyperparameters I have in my script:

batch_size=10
weight_decay = 0.00004 
optimizer = rmsprop
rmsprop_momentum = 0.9 
rmsprop_decay = 0.9 

learning_rate_decay_type =  exponential 
learning_rate =  0.01 
learning_rate_decay_factor = 0.94 
num_epochs_per_decay = 2.0 #'Number of epochs after which learning rate

I'm still newbie in tensorflow and I did not find anything related anywhere else. This is a really weird behavior because I'm using the same parameters and same inputs but it seems using NASNet there is a problem somewhere. I'm not only looking for a solution (if possible because I know it is tough to troubleshoot such things without too much details about the model) but insights about where to look and how to troubleshoot would be great. Does anybody had this problem with finetuning NASNet before? something like the model didn't converge for example? Finally, I know it is really hard to got answers on such questions but I hope to get at least some insights so I can move forward with my investigations.

EDIT: Here are the plots of the cross entropy and regularization losses:

EDIT: As proposed in the answer, I did set the drop_path_keep_prob params to 1 and now the model converged and I got good accuracy on the validation set. But now the question is: what does this param mean? Is it one of the params that we should adapt to our dataset (like learning rate etc..)?

gngdb · Accepted Answer

The simplest sanity check you can do would be to run the finetuning on a single minibatch. Any deep network should be able to overfit to that, if there aren't any big problems. If you see that it can't do that, then there must be some problem with the definition, or the way you're using the definition.

The only guess I have in your case is that it could be something to do with the drop_path implementation. It's disabled in the mobile version, but it is enabled during training on the large model. It could make the model unstable enough that it wouldn't fine tune, so it may be worth trying to train with it disabled.

NASNet-A fine tuning poor validation accuracy

Answers (1)

Related Questions