Reputation: 1947
I am trying to understand tensorflow object detection config fields exactly.
And according to this article(https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359), for good balance between accuracy and speed, I changed first_stage_max_proposals from origin 100 to 50.
Good news is, it indeed reduced the inference latency(from 4.2 seconds to 2.2 per image), however, bad news is it also decreases the accuracy.
Then, I changed max proposals from 50 to 70, accuracy is better.
So, I wanna know exactly what the max proposals controls. Does it related to any other config something like max_detections_per_class or max_total_detections .etc?
I googled a lot but seems less upon this guy. I use python3.6.4 and tensorflow 1.8.0, and here is my model config:
model {
faster_rcnn {
num_classes: 3
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 670
max_dimension: 1013
}
}
feature_extractor {
type: "faster_rcnn_resnet101"
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 70
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.3
iou_threshold: 0.6
max_detections_per_class: 30
max_total_detections: 30
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_batch_size: 70
}
}
train_config {
batch_size: 1
data_augmentation_options {
random_horizontal_flip {
}
}
optimizer {
momentum_optimizer {
learning_rate {
exponential_decay_learning_rate {
initial_learning_rate: 0.0003
decay_steps: 2000
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "d:/od/tool/faster_rcnn3/model.ckpt"
from_detection_checkpoint: true
}
train_input_reader {
label_map_path: "d:/od/project/train_allinone/file/labelmap.pbtxt"
tf_record_input_reader {
input_path: "d:/od/project/train_allinone/file/tf.record"
}
}
Any explanation on this is great appreciated.
Thanks.
Upvotes: 0
Views: 2548
Reputation: 59731
Looking at faster_rcnn.proto
:
// Naming conventions:
// Faster R-CNN models have two stages: a first stage region proposal network
// (or RPN) and a second stage box classifier. We thus use the prefixes
// `first_stage_` and `second_stage_` to indicate the stage to which each
// parameter pertains when relevant
And so:
// Maximum number of RPN proposals retained after first stage postprocessing.
optional int32 first_stage_max_proposals = 15 [default=300];
Faster R-CNN has two networks, the first proposes regions where objects may be found and the second tries to detect objects in those. Increasing the number of proposals by the first network increases the accuracy but implies more computational work, because the second network has to search in more potential areas. For a quick explanation on how Faster R-CNN works check out Faster R-CNN Explained, and if you want to have the full picture you can look at the original publication: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
Upvotes: 2