Reputation: 23
We are facing an issue where loading a new version of a model in TensorFlow Serving takes a long time during the warmup process (in our case, 10 minutes). This is a problem for us.
While investigating the issue, we found that the warmup process uses only 1 CPU core. We tried disabling warmup, but then the first request to the model also takes 10 minutes and still uses only 1 CPU core. However, subsequent requests (whether with or without warmup) use all available CPU cores.
We are aware of lazy loading, but our question is:
We have tried different TensorFlow Serving parameters, including tensorflow_intra_op_parallelism
and tensorflow_inter_op_parallelism
, as well as ModelWarmupOptions.num_model_warmup_threads
, but this did not help.
We also observed that the issue might not be specific to TensorFlow Serving. When loading the model via a script and running the first request, it also uses only 1 CPU core and takes a long time.
We tested this on versions 2.5, 2.15, and 2.17.
Upvotes: 0
Views: 35