Reputation: 11
This is my first model, i´m new to python and this is my second post on Stackoverflow so please let me know if there is anything i should elaborate and keep in mind there could be a easy solution to my problems.
EDIT - I have found a way to test the model on images after training (when its a pg file) and this gives a great result so i´m right now focusing the question on the last prat, the conversion to ir and blob EDIT
Problem to solve with model; Easiest way to explain it is an example. Imagine a parking lot with 8 parking spots pared two by two. simple sketch over parking lot
the idea is to know what parking spots that are occupied or not. as the camera is located "from the side" i cant really mark the parking spots as if for example spot 6 is occupied the ground that is spot 5 is not visible.
my plan is that when a car is coming down the road the car is labled "car" and when the car parks the car is re-labeled as "car_6" (if the car parks on parkingspot 6). do i need to use some sort of hierarchy to make the change from "car" to "car_6" without problems? (The parking-spot is an example to explain the problem so there will be no problems with permissions to record videos ). The objects that will be detected in the model are about the same size of a car but round.
After some advice i have heard that ssd_mobilenet_v2 is a good model for the problem. As speed is not really that important i think all models would be able to get the job done but i have worked a bit with the mobilenet not and learned a bit so i would like to stay with this model.
I'm using Tensorflow 2 in windows 10.
What i have done so far;
Labeled (with labelimg) around 250 pictures (.XML) and made them to csv and then to .record. the pictures are 320*320 in jpg-format
The label map is a file created as a txt and then saved and turned to pbtxt (just by adding the pb in the file name in windows).
(filepath) is used to make the links a bit shorter, i have a very excessive directory system for this.
This is my training settings: D:/(filepath)/object detection> python model_main_tf2.py --pipeline_config_path D:\(filepath)\pre-trained-models\ssd_mobilenet_v2_320x320_coco17_tpu-8\pipeline.config --model_dir D:\(filepath)\models\model_out
with the pipeline config:
model {
ssd {
num_classes: 9
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
feature_extractor {
type: "ssd_mobilenet_v2_keras"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.005
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.029
}
}
activation: RELU_6
batch_norm {
decay: 0.9700000286102295
center: true
scale: true
epsilon: 0.0010000000474974513
train: true
}
}
override_base_feature_extractor_hyperparams: true
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.03
}
}
activation: RELU_6
batch_norm {
decay: 0.9
center: true
scale: true
epsilon: 1.0
train: true
}
}
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: 0.2
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 0.5
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.75
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 4
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_rgb_to_gray {
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.008
total_steps: 50000
warmup_learning_rate: 0.002
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
fine_tune_checkpoint: "D:/Visual_Studio/TensorflowTake3/models/research/object_detection/downloaded_ssd_mobilenet_v2_320x320/checkpoint/ckpt-0"
num_steps: 30000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "D:/Visual_Studio/TensorflowTake3/label_map/label_map.pbtxt"
tf_record_input_reader {
input_path: "D:/Visual_Studio/TensorflowTake3/record_file/tf_cropped/tf_cropped_train.record"
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "D:/Visual_Studio/TensorflowTake3/label_map/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "D:/Visual_Studio/TensorflowTake3/record_file/tf_cropped/tf_cropped_test.record"
}
}
(i hope the code is readable, i had some problems with the coding)
}```
I have tried a number of different versions of this file. As i have a relative small number of pictures and the "simplicity" of the pictures is quite high i have tried using num_steps
from 1000 up to 50000 rendering in pretty much the same result.
I have changed the data_augmentation_options
to
random_horizontal_flip
in hope of not changing the pictures to much as the model might could benefit from that as the final environment is very static.
The model will be applied on a Oak-D camera, used inside with not much natural light so the change in color for the final model will be minimal.
After training this script is used to export:
python .\exporter_main_v2.py --input_type image_tensor --pipeline_config_path D:\(filepath)\ssd_mobilenet_v2_320x320_coco17_tpu-8\pipeline.config --trained_checkpoint_dir D:\(filepath)\model_out --output_directory D:\(filepath)\my_model
This is used for creating a IR: python mo_tf.py --saved_model_dir D:\(filepath)\saved_model\ --data_type=FP16 --scale_values [255,255,255] --output_dir D:\(filepath)\20220324_night --tensorflow_use_custom_operations_config C:\Users\(filepath)\envs\openvino\Lib\site-packages\mo\extensions\front\tf\ssd_support_api_v2.4.json --tensorflow_object_detection_api_pipeline_config D:\(filepath)\model\pipeline.config --log_level DEBUG --tensorboard_logdir D:\(filepath)\logdir
For the final conversion i use the online blob converter at http://blobconverter.luxonis.com/ with Choose model source "open vino model" myriadX compile params: -ip U8 (default) and shaves 6.
When i test the model with the oak-d i use a script from the DepthAI https://docs.luxonis.com/projects/api/en/develop/samples/MobileNet/video_mobilenet/ This is the RGB camera in Oak-D run against video that i have captured of the final environment.
The problem i get when running:
most often have boundingboxes all over the screen and they are not "focusing" on anything special, the are just everywhere. in this case, when i change the n.setConfidenceThreshold(0.5)
there is a point at 0.2 that either gives me no boundingboxes or to many boundingboxes not focusing on anything.
When i tried with 50000 steps i had a row of bounding boxes in the bottom of the screen (in a for the model "dead space") and not reacting at all on the cars coming into screen.
As i feel i´m stuck, not knowing what to do all ideas are welcome. I´m trying to research what to do but as i don´t know what the problem might be its tricky to know where to start.
The original photos captured for the labeling is in 16/9 aspect ratio, would it be possible to keep this ratio throughout the process? If so the positions for the different parking-spaces would be intact through the training (hopefully) enhancing the models ability to by geographical localization know where the different parking spots (and the road) is located.
I have found a way to test the model after training on images and this gives a great result.
My idea in the long run is to get a model working to be able to have something to "experiment" on to learn more.
I have read somewhere that someone having the same problem solved it by changing the label_map in the .record file. As i understand it the .record file is more or less a file consisting of tensors, would it be possible to check this file to see the order in the label_map or have i understood this wrong?
As mentioned above all advice/ideas are welcome, and keep in mind i´m new to this. This means that i could have done just a simple mistake.
Sorry for this long post and please let me know if there is anything i should elaborate/explain further.
best,
Martin
Upvotes: 1
Views: 572
Reputation: 104
From OpenVINO perspective you may refer to this Object Detection Python Demo since this demo is rather similar to your aim. Models that are similar/close to this kind of implementation are also available.
The thing that you need to figure out is how are you going to train the model to differentiate whether the empty spot is taken or not. You'll need to decide stimuli that differentiate those, for example, in IoT, infrared sensor in the parking lot determines empty/occupied lot through a threshold value. Meanwhile, in AI, it's up to you as the developer to decide how to determine that concept. This might give you some idea. (They even share their source code).
Another thing to note is, the OpenCV OAK-D is not a part of the OpenVINO Toolkit. Hence, anything that specific to this, you'll need to reach their discussion forum or community.
Upvotes: 0