Wesley
Wesley

Reputation: 1947

What does tensorflow object detection max proposals mean exactly?

I am trying to understand tensorflow object detection config fields exactly.

And according to this article(https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359), for good balance between accuracy and speed, I changed first_stage_max_proposals from origin 100 to 50.

Good news is, it indeed reduced the inference latency(from 4.2 seconds to 2.2 per image), however, bad news is it also decreases the accuracy.

Then, I changed max proposals from 50 to 70, accuracy is better.

So, I wanna know exactly what the max proposals controls. Does it related to any other config something like max_detections_per_class or max_total_detections .etc?

I googled a lot but seems less upon this guy. I use python3.6.4 and tensorflow 1.8.0, and here is my model config:

model {
  faster_rcnn {
    num_classes: 3
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension:  670
        max_dimension: 1013
      }
    }
    feature_extractor {
      type: "faster_rcnn_resnet101"
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        height_stride: 16
        width_stride: 16
        scales: 0.25
        scales: 0.5
        scales: 1.0
        scales: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 1.0
        aspect_ratios: 2.0
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 70
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        use_dropout: false
        dropout_keep_probability: 1.0
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.3
        iou_threshold: 0.6
        max_detections_per_class: 30
        max_total_detections: 30
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    second_stage_batch_size: 70
  }
}
train_config {
  batch_size: 1
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  optimizer {
    momentum_optimizer {
      learning_rate {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.0003
          decay_steps: 2000
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "d:/od/tool/faster_rcnn3/model.ckpt"
  from_detection_checkpoint: true
}
train_input_reader {
  label_map_path: "d:/od/project/train_allinone/file/labelmap.pbtxt"
  tf_record_input_reader {
    input_path: "d:/od/project/train_allinone/file/tf.record"
  }
}

Any explanation on this is great appreciated.

Thanks.

Upvotes: 0

Views: 2548

Answers (1)

javidcf
javidcf

Reputation: 59731

Looking at faster_rcnn.proto:

// Naming conventions:
// Faster R-CNN models have two stages: a first stage region proposal network
// (or RPN) and a second stage box classifier.  We thus use the prefixes
// `first_stage_` and `second_stage_` to indicate the stage to which each
// parameter pertains when relevant

And so:

// Maximum number of RPN proposals retained after first stage postprocessing.
optional int32 first_stage_max_proposals = 15 [default=300];

Faster R-CNN has two networks, the first proposes regions where objects may be found and the second tries to detect objects in those. Increasing the number of proposals by the first network increases the accuracy but implies more computational work, because the second network has to search in more potential areas. For a quick explanation on how Faster R-CNN works check out Faster R-CNN Explained, and if you want to have the full picture you can look at the original publication: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Upvotes: 2

Related Questions