TensorFlow API Slim: How to set checkpoint_exclude_scopes and output_node_names for VGG-Net 16?

Question

I am currently trying to train classification networks using TensorFlow API (https://github.com/tensorflow/models). After creating TFrecords for my data set (stored in research/slim/data), I train the networks using following command:

python research/slim/train_image_classifier.py \
--train_dir=research/slim/training/current_model \
--dataset_name=my_dataset \
--dataset_split_name=train \
--dataset_dir=research/slim/data \
--model_name=vgg_16 \
--checkpoint_path=research/slim/training/vgg_16_2016_08_28/vgg_16.ckpt \
--checkpoint_exclude_scopes=vgg_16/fc7,vgg_16/fc8 \
--trainable_scopes=vgg_16/fc7,vgg_16/fc8 \
--batch_size=5 \
--log_every_n_steps=10 \
--max_number_of_steps=1000 \

This works well for several classification networks (Inception, ResNet, MobileNet), but not so good for VGG-Net. I fine-tune following model of VGG-Net 16: http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz

In general, it works to train this model, but when I train the network, the loss increases and not decreases. Maybe, it is due to my choice of 'checkpoint_exclude_scopes'.

Is it correct, to use the last fully-connected layer as checkpoint_exclude_scopes?

The same question occurs by freezing the graph, for the parameter 'output_node_names'. For InceptionV3, e.g., it works with 'output_node_names=InceptionV3/Predictions/Reshape_1'. But how to set this parameter for VGG-Net. I tried the following:

python research/slim/freeze_graph.py
--input_graph=research/slim/training/current_model/graph.pb
--input_checkpoint=research/slim/training/current_model/model.ckpt
--input_binary=true 
--output_graph=research/slim/training/current_model/frozen_inference_graph.pb 
--output_node_names=vgg_16/fc8

I didn't find any layer containing 'Predictions' or 'Logits' in the VGG-Net model, so I am not sure.

Thank you for helping!

Anju Paul - Intel · Accepted Answer

I tried to run train_image_classifier.py as in your script with a few changes as mentioned below:

Changed train_dir, dataset_dir and checkpoint_path to my local path
Since I ran on CPU, added --clone_on_cpu=True parameter to the command
Removed the parameter dataset_name=my_dataset since it was throwing error for me

It ran fine. The loss started as high as 448 and then slowly it reduced and by the end of 1000th step it reduced to 3.5. It did fluctuate considerably, but the trend of loss was downward. Not sure why you were not able to see the same while trying to run.

Regarding your question on checkpoint_exclude_scopes while training and output_node_names while freezing graph, I think your choice of layers is absolutely fine. However, I would have preferred to train only the last fully connected layer(fc8) for faster convergence.

TensorFlow API Slim: How to set checkpoint_exclude_scopes and output_node_names for VGG-Net 16?

Answers (1)

Related Questions