djvaroli
djvaroli

Reputation: 1383

Saved Model file size is the same after pruning with Tensorflow Model Optimization

I have a model that is around 1.1gb when I save it with the model.save() API. That's a bit too big for my liking so I tried pruning it following the official Tensorflow tutorial (https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras).

After following that the zipped version of the model is like 10 times smaller, but when I save the model with the same model.save() API the model is exactly the same size. Now, I did this before with another model and that one actually did see about a 3x reduction in the final size of the model, after saving.

My question is, am I misunderstanding what pruning is supposed to achieve and can I use it to shrink my models for TensorflowServing to reduce resource use in production? I get that saving the weights would result in the same file size since you still have that same number of weights. However, I was under the impression that when you save the model out to a .pb file for serving, that will perform the needed compression allowing you to make use of those pruned out weights.

Otherwise, it feels like there is no advantage to doing this other than storage space, which isn't really the concern I have.

I know that I can also convert the model to a TFLite format, but I can't find documentation telling me whether or not I will be able to serve that using TensorflowServing, so I am not sure if that will work for me.

Upvotes: 2

Views: 1020

Answers (1)

lclaxton
lclaxton

Reputation: 11

The reason you are seeing no change in model size with pruning alone is that the weights are still stored as a floating point value, they are just set as 0. If you subsequently zip the model you will see that a pruned model is smaller in size than the original model. This can be useful for moving the model around, for example sending the model to an edge ai device that is bandwidth limited.

Alternatively you can convert the outputted pruned weight tensor to a sparse matrix, which should also reduce the size. I believe this is available in TF but not PyTorch. However, by converting to a sparse matrix you may limit the usage of you model.

Upvotes: 1

Related Questions