Saved Model file size is the same after pruning with Tensorflow Model Optimization

Question

I have a model that is around 1.1gb when I save it with the model.save() API. That's a bit too big for my liking so I tried pruning it following the official Tensorflow tutorial (https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras).

After following that the zipped version of the model is like 10 times smaller, but when I save the model with the same model.save() API the model is exactly the same size. Now, I did this before with another model and that one actually did see about a 3x reduction in the final size of the model, after saving.

My question is, am I misunderstanding what pruning is supposed to achieve and can I use it to shrink my models for TensorflowServing to reduce resource use in production? I get that saving the weights would result in the same file size since you still have that same number of weights. However, I was under the impression that when you save the model out to a .pb file for serving, that will perform the needed compression allowing you to make use of those pruned out weights.

Otherwise, it feels like there is no advantage to doing this other than storage space, which isn't really the concern I have.

I know that I can also convert the model to a TFLite format, but I can't find documentation telling me whether or not I will be able to serve that using TensorflowServing, so I am not sure if that will work for me.

Saved Model file size is the same after pruning with Tensorflow Model Optimization

Answers (1)

Related Questions