Reputation: 83

Problem detecting large number of objects in single image with Tensorflow Object Detection API

I need to detect large numbers of two classes of objects in a single image. I've had some success using the Tensorflow Object Detection API by retraining the faster_rcnn_inception_resnet_v2_atrous_coco network from the Object Detection Model Zoo using the following config file:

model {
  faster_rcnn {
    num_classes: 2
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 8
        width_stride: 8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 2000
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 17
    maxpool_kernel_size: 1
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 1000
        max_total_detections: 1000
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/path/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/train.record"
  }
  label_map_path: "/path/label_map.pbtxt"
}

eval_config: {
  num_examples: 8000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/path/val.record"
  }
  label_map_path: "/path/label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

However, using an Nvidia M10 with 8 GB memory, I'm only able to get detections on (roughly) the top half of the image:

This pattern is consistent across many images, with some images having a few bounding boxes lower down on the image, but no images having bounding boxes accurately distributed throughout the image. My first thought was that it was a memory problem, so I tried running the detection on a GPU with more memory (Nvidia V100 with 32 GB memory). I changed the config file to raise the first_stage_max_proposals from 2000 to 4000 and the max_detections_per_class/max_total_detections from 1000 to 2000 (on the 8 GB GPU these settings led to an Aborted (core dumped) error). The results were only marginally better:

I tried raising the first_stage_max_proposals to 8000 and the max_detections_per_class/max_total_detections to 4000, but this led to an Aborted (core dumped) error on the 32 GB GPU.

My questions are:

1) Are these the best config settings for detecting large numbers of objects in a single image?

2) Is there a better network than faster_rcnn_inception_resnet_v2_atrous_coco for this specific task?

3) Is there an entirely different approach that's better suited to this problem?

I've considered splitting the image up into smaller images and running it on those, but if possible I'd like to keep it as one image, as accurate counts of the objects are important to my application and splitting the objects along some dividing line might lead to inaccurate counts.

Thanks!

Upvotes: 2

Answers (2)

Kashif Iqbal

Reputation: 11

I was facing the same problem. So, I just made some adjustments in the config file of model Faster R-CNN Inception ResNet V2 1024x1024 from Model Zoo. Like:

first_stage_max_proposals: 1500
max_detections_per_class: 1500
max_total_detections: 1500

Add the max_number_of_boxes: 1500 into the train_config, train_input_reader and eval_input_reader block. I also add max_num_boxes_to_visualize: 1500 to the eval_config block.

This work totally fine for me. So, now I am getting the detection of approximate 1500 objects in a single image.

Upvotes: 1

Nuntipat Narkthong

Reputation: 1397

Beside adjusting the max_detections_per_class and max_total_detections, you need to add max_number_of_boxes into the train_config, train_input_reader and eval_input_reader block and max_num_boxes_to_visualize to the eval_config block otherwise the ground truth box will be clipped from training and evaluation process. I've deployed a model to solve a similar problem to yours where we tried to detected many small objects and faster_rcnn_inception_resnet_v2_atrous_coco works quite well so that shouldn't be your problem.

Upvotes: 0

Problem detecting large number of objects in single image with Tensorflow Object Detection API

Answers (2)

Related Questions