Reputation: 2272
Let's say I have a network model made in TensorFlow
/Keras
/Caffe
etc.
I can use CoreML Converters
API to get a CoreML model file (.mlmodel
) from it.
Now, as I have a .mlmodel
file, and know input shape
and output shape
, how can a maximum RAM footprint be estimated?
I know that a model сan have a lot of layers, their size can be much bigger than input/output shape.
So the questions are:
mlmodel
memory footprint be known with some formula/API, without compiling and running an app?Any advice is appreciated. As I am new to CoreML, you may give any feedback and I'll try to improve the question if needed.
Upvotes: 0
Views: 389
Reputation: 7902
I wrote a blog post a few years ago that goes into some of this: https://machinethink.net/blog/how-fast-is-my-model/
However, keep in mind that Core ML's actual behavior is not known. It will most likely try to be as efficient as possible (i.e. reuse the memory for tensors that are no longer needed) but it's a black box, so who knows. The only way to find out is to try your models on an actual device.
Upvotes: 2
Reputation: 6103
IMHO, whatever formula you come up with at the end of the day must be based on the number of trainable parameters of the network.
For classifying networks it can be found by iterating or the existing API can be used.
In keras.
import keras.applications.resnet50 as resnet
model =resnet.ResNet50(include_top=True, weights=None, input_tensor=None, input_shape=None, pooling=None, classes=2)
print model.summary()
Total params: 23,591,810
Trainable params: 23,538,690
Non-trainable params: 53,120
Pytorch:
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
For the detectors, you probably need to do the same for all the important parts of the network, including the backbone, rpn, etc., whatever your network consists of.
The second important parameter is the precision of the network. You must be heard about quantization. It changes the precision of floats for all or some layers and can be static (when the network is trained in desired precision and calibrated) or dynamic when the network is converted after the training. The simplest dynamic quantization replaces floats to some kind of ints on linear layers. Maskrcnn in pytorch results in 30% smaller file size and a substantial reduction in memory consumption with the same number of trainable parameters.
So the final equation is like size = number_of_trainable_parameters * precision * X, where X is some factor you have to find out for your particular network and coreml specifics )
Upvotes: 2