Reputation: 51
I'm using the Tensorflow Object Detection API to create a custom object detector. I'm using the COCO trained models for transfer learning.
I trained it using Faster Rcnn Resnet and got very accurate results, but the inference speed of this model is very slow. I tried training it with SSD mobilenet V2, which has very fast speed, but I'm getting very low accuracy with this model. Is there anything I can change in the config file to increase the accuracy of the model? Or will the SSD model not give very accurate results since it's a lightweight model? Here's the config file I'm using right now. (I trained it using ~150 images and for 10000 steps)
ssd {
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
reduce_boxes_in_lowest_layer: true
}
}
image_resizer {
fixed_shape_resizer {
height: 900
width: 400
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
}
}
}
feature_extractor {
type: 'ssd_inception_v2'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 12
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "/content/models/research/pretrained_model/model.ckpt"
from_detection_checkpoint: true
num_steps: 10000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}```
Upvotes: 4
Views: 11838
Reputation: 2621
There are so many places that you can improve.
Typically, you want to use a small input size for SSD, e.g. 320x320, which should at least 3x faster than your current input size 900x400 looks strange.
In addition, you only have 1 foreground class. You typically want to double check on the required anchors and min_size/max_size, all of which are related to prior-box used in SSD. I am pretty sure that the default config, which is for ms-coco, does not fit well in many tasks. For example, if it is a car plate detection task, the plate width is much greater than the height, and thus you can safe drop those aspect_ratios <= 1.
In addition, min_size and max_size are also important. If you use the default settings, you will have anchor boxes with size even bigger than your input image size, is this something you expect? If not, you want to adjust the settings too.
Furthermore, you want to dive deep to see what data augmentation fits your problem best. Recently, auto augmentation is also added.
Finally, you can always boost your performance by using new losses, e.g. focal loss for classification.
Upvotes: 0
Reputation: 27
It is very difficult to get high accuracy from a model that was designed to run on mobile phones.
My suggestion is to use the high accuracy model and improve the inference time. Convert the model to TensorRT.
https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/object_detection
Upvotes: 2
Reputation: 705
You can increase the number of steps :
num_steps : 2000000
And then if the loss is at around 1 or 2 and still the prediction outcomes are not satisfying then nothing can be done. You can try some other model. You could also refer to the COCO trained datasets and chose one with higher COCO mAP[^1] and lesser Speed (ms).
You can try different models and see what works best for your application.
If still, the problem persists you could try increasing the number of training images
Upvotes: 1