Reputation: 351
I'm trying to train a vehicle detector on Images with 4K-Resolution with about 100 small-sized vehicles per image (vehicle size about 100x100 pixel).
I'm currently using the full resolution, which costs me a lot of memory. I'm training using 32 cores and 128 GB RAM. The current architecture is Faster RCNN. I can train with a second stage batch size of 12 and a first_stage_mini_batch_size of 50. (I scaled both down until my memory was sufficient).
I currently separated my dataset only into three classes (cars, trucks, vans).
I assume giving additional information like:
would improve the training process.
Would truncated include overlapped vehicles?
Would additional Information like views/poses and other labels also improve the training process, or would it make the training harder?
Upvotes: 1
Views: 1255
Reputation: 55
So it looks like changing the data works correctly. I'm not sure how can it affect the model, basically it's pretrained without these examples and fine-tuned with them
LOG:
INFO:tensorflow:Restoring parameters from /home/.../train_output/model.ckpt-3190
Real life example from https://arxiv.org/abs/1311.2901 : ImageNet and Caltech datasets have some images in common. While evaluating how well your model trained with ImageNet performs with and Caltech as validation, you should remove duplicates from ImageNet before training.
Upvotes: 1
Reputation: 1558
For your problem, the out-of-the-box config files won't work so well due to the high resolutions of the images and the small cars. I recommend:
Upvotes: 1