olha
olha

Reputation: 2272

How to estimate a CoreML model's maximal runtime footprint (in megabytes)

Let's say I have a network model made in TensorFlow/Keras/Caffe etc. I can use CoreML Converters API to get a CoreML model file (.mlmodel) from it.

Now, as I have a .mlmodel file, and know input shape and output shape, how can a maximum RAM footprint be estimated? I know that a model сan have a lot of layers, their size can be much bigger than input/output shape.

So the questions are:

  1. Can be a maximal mlmodel memory footprint be known with some formula/API, without compiling and running an app?
  2. Is a maximal footprint closer to a memory size of the biggest intermediate layer, or is it closer to a sum of the all layer's sizes?

Any advice is appreciated. As I am new to CoreML, you may give any feedback and I'll try to improve the question if needed.

Upvotes: 0

Views: 389

Answers (2)

Matthijs Hollemans
Matthijs Hollemans

Reputation: 7902

I wrote a blog post a few years ago that goes into some of this: https://machinethink.net/blog/how-fast-is-my-model/

However, keep in mind that Core ML's actual behavior is not known. It will most likely try to be as efficient as possible (i.e. reuse the memory for tensors that are no longer needed) but it's a black box, so who knows. The only way to find out is to try your models on an actual device.

Upvotes: 2

Cynichniy Bandera
Cynichniy Bandera

Reputation: 6103

IMHO, whatever formula you come up with at the end of the day must be based on the number of trainable parameters of the network.

For classifying networks it can be found by iterating or the existing API can be used.

In keras.

import keras.applications.resnet50 as resnet

model =resnet.ResNet50(include_top=True, weights=None, input_tensor=None, input_shape=None, pooling=None, classes=2)
print model.summary()

Total params: 23,591,810
Trainable params: 23,538,690
Non-trainable params: 53,120

Pytorch:

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

For the detectors, you probably need to do the same for all the important parts of the network, including the backbone, rpn, etc., whatever your network consists of.

The second important parameter is the precision of the network. You must be heard about quantization. It changes the precision of floats for all or some layers and can be static (when the network is trained in desired precision and calibrated) or dynamic when the network is converted after the training. The simplest dynamic quantization replaces floats to some kind of ints on linear layers. Maskrcnn in pytorch results in 30% smaller file size and a substantial reduction in memory consumption with the same number of trainable parameters.

So the final equation is like size = number_of_trainable_parameters * precision * X, where X is some factor you have to find out for your particular network and coreml specifics )

Upvotes: 2

Related Questions