Horst Lemke
Horst Lemke

Reputation: 351

Tensorflow Object Detetection training best practice questions

Training on large scale images:

I'm trying to train a vehicle detector on Images with 4K-Resolution with about 100 small-sized vehicles per image (vehicle size about 100x100 pixel).

I'm currently using the full resolution, which costs me a lot of memory. I'm training using 32 cores and 128 GB RAM. The current architecture is Faster RCNN. I can train with a second stage batch size of 12 and a first_stage_mini_batch_size of 50. (I scaled both down until my memory was sufficient).

  1. I assume, that I should increase the max number of RPN proposals. Which dimension would be appropriate?
  2. Does this approach make sense?

Difficulty, truncated, labels and poses:

I currently separated my dataset only into three classes (cars, trucks, vans).

  1. I assume giving additional information like:

    • difficult (for mostly hidden vehicles), and
    • truncated (I currently did not select truncated objects, but I could)

would improve the training process.

  1. Would truncated include overlapped vehicles?

  2. Would additional Information like views/poses and other labels also improve the training process, or would it make the training harder?

Adding new data to the training set:

  1. Is it possible to add new images and objects into the training and validation record files and automatically resume the training using the latest checkpoint file from the training directory? Or is the option "fine_tune_checkpoint" with "from_detection_checkpoint" necessary?
  2. Would it harm, if a random separation of training and validation data would pick different datasets than in the training before?

Upvotes: 1

Views: 1255

Answers (2)

Piotr Ciesiołkiewicz
Piotr Ciesiołkiewicz

Reputation: 55

  1. I switched the evaluation and training data (in config) and training continues as normal with exactly same command start it.
    • there's log about restoring parameters from last checkpoint
    • as I switch test/train data mAP immediately shoots too the moon
    • Images tab in the tensorboard gets updated

So it looks like changing the data works correctly. I'm not sure how can it affect the model, basically it's pretrained without these examples and fine-tuned with them

LOG:

INFO:tensorflow:Restoring parameters from /home/.../train_output/model.ckpt-3190
  1. This results in train/test contamination and real model performance suppose to be lower than one calculated on the contaminated validation dataset. You shouldn't worry that much unless you want to present some well-defined results

Real life example from https://arxiv.org/abs/1311.2901 : ImageNet and Caltech datasets have some images in common. While evaluating how well your model trained with ImageNet performs with and Caltech as validation, you should remove duplicates from ImageNet before training.

Upvotes: 1

Jonathan Huang
Jonathan Huang

Reputation: 1558

For your problem, the out-of-the-box config files won't work so well due to the high resolutions of the images and the small cars. I recommend:

  • Training on crops --- cut your image into smaller crops, keeping the cars roughly at about the same resolution as they are now.
  • Eval on crops --- at inference time, cut up your image into a bunch of overlapping crops, and run inference on each one of those crops. Usually people combine the detections across the multiple crops using non-max-suppression. See slide 25 here for an illustration of this.
  • I highly recommend training using a GPU or better yet, multiple GPUs.
  • Avoid tweaking the batch_size parameters to begin with --- they are set up to work quite well out of the box and changing them will often make it difficult to debug.
  • Currently the difficult/truncated/pose fields are not used during training, so including them won't make a difference.

Upvotes: 1

Related Questions