Tensorflow Object Detetection training best practice questions

Question

Training on large scale images:

I'm trying to train a vehicle detector on Images with 4K-Resolution with about 100 small-sized vehicles per image (vehicle size about 100x100 pixel).

I'm currently using the full resolution, which costs me a lot of memory. I'm training using 32 cores and 128 GB RAM. The current architecture is Faster RCNN. I can train with a second stage batch size of 12 and a first_stage_mini_batch_size of 50. (I scaled both down until my memory was sufficient).

I assume, that I should increase the max number of RPN proposals. Which dimension would be appropriate?
Does this approach make sense?

Difficulty, truncated, labels and poses:

I currently separated my dataset only into three classes (cars, trucks, vans).

I assume giving additional information like:
- difficult (for mostly hidden vehicles), and
- truncated (I currently did not select truncated objects, but I could)

would improve the training process.

Would truncated include overlapped vehicles?
Would additional Information like views/poses and other labels also improve the training process, or would it make the training harder?

Adding new data to the training set:

Is it possible to add new images and objects into the training and validation record files and automatically resume the training using the latest checkpoint file from the training directory? Or is the option "fine_tune_checkpoint" with "from_detection_checkpoint" necessary?
Would it harm, if a random separation of training and validation data would pick different datasets than in the training before?

Jonathan Huang · Accepted Answer

For your problem, the out-of-the-box config files won't work so well due to the high resolutions of the images and the small cars. I recommend:

Training on crops --- cut your image into smaller crops, keeping the cars roughly at about the same resolution as they are now.
Eval on crops --- at inference time, cut up your image into a bunch of overlapping crops, and run inference on each one of those crops. Usually people combine the detections across the multiple crops using non-max-suppression. See slide 25 here for an illustration of this.
I highly recommend training using a GPU or better yet, multiple GPUs.
Avoid tweaking the batch_size parameters to begin with --- they are set up to work quite well out of the box and changing them will often make it difficult to debug.
Currently the difficult/truncated/pose fields are not used during training, so including them won't make a difference.

Tensorflow Object Detetection training best practice questions

Training on large scale images:

Difficulty, truncated, labels and poses:

Adding new data to the training set:

Answers (2)

Related Questions