Reputation: 113
I trained faster_rcnn_nas model with my custom dataset (resized images 1280x1080). My GPU is Nvidia Quadro P5000 and I can test the model on this computer. When I test with GTX 1060 it crashes and gives memory error. But when I test pre-trained faster_rcnn_nas it works fine.
What is the differences between pre-trained and custom model? Is there any way to run the model with 1060? Or is there any batch_size or similar parameters to change for testing?
What I have done: I limited my gpu and found out that I need minimum 8gb gpu to test my model.
Full error:
ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[500,4032,17,17] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node MaxPool2D/MaxPool-0-TransposeNHWCToNCHW-LayoutOptimizer}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Sum/_275]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[500,4032,17,17] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node MaxPool2D/MaxPool-0-TransposeNHWCToNCHW-LayoutOptimizer}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations. 0 derived errors ignored.
Upvotes: 2
Views: 1928
Reputation: 1836
For your Question 1: What are the differences between pre-trained and custom models?
This differs whether you will train the model or only use the model for inference.
Hence, when you use the model for inference it will require very small memory compared to when training the model.
But when you train the model using Tensorflow GPU this requires more memory compared to CPU-only training but with faster execution time especially when dealing with complex model architectures (ie. Faster RCNN),.
This is also noticeable when the model is using computation heavy layers like Convolutional Layers, as the speed-up of the calculations will be more drastic for the cost of more memory.
Question 2: Is there any way to run the model with 1060? Or is there any batch_size or similar parameters to change for testing?
When testing or inference, data_length is typically arbitrary, you could check this when the input_shape = (None, ##,##)
the first parameter is None
. This means that the model will accept data with any different length lowest data_length = 1
.
Meaning you can only make use of batch_size when explicitly defined in the input shape ie. (None, BATCH_SIZE, ##, ##)
or feed the model with data with length of batch_size ie. (BATCH_SIZE, ##, ##)
.
One way to avoid Memory Error is to change the batch_size
parameter of model.predict and model.train to a lower value, this will also increase your model accuracy but will train longer.
Another one is to convert your dataset into Dataset Generator which caches your data rather than loading it directly into your memory.
You could read more about Building Input Pipelines in Tensorflow on this link.
Upvotes: 1