Reputation: 111
I'm deploying tensorflow serving image (tensorflow/serving:latest-gpu) with Kubernetes (GKE) with GPU nodes (K80) hosted on GCP.
command:
command: ["tensorflow_model_server"]
args: ["--port=8500", "--rest_api_port=8501", "--enable_batching", "--batching_parameters_file=/etc/config/batching_parameters","--model_config_file=/etc/config/model_config"]
batching parameters :
maxBatchSize: 4096
batchTimeoutMicros: 25000
maxEnqueuedBatches: 16
numBatchThreads: 16
I use --model_config_file
to load versions model from a GCS bucket. Tensorflow serving pulls every new version model and load it, when it's done he unload the old one (but it's look like he keeps it in memory)
When I'm using limit/request under the max available resources on host the pod finished OOMKilled because using max memory then allowed. But when I'm using limit/request matching the max available resources on host (dedicated) it's look like the memory is flushed to respect this maximum.
Do you know if we can set max memory to tensorflow or tell him to use cgroup memory limits (use by docker/kubernetes) please ?
Can we fully flush old versions model to release memory ?
Besides, every time I execute a request it increase the memory and never release it. Do you have any idea please ?
Node info :
7 vCPU
30 Gb RAM
1 GPU K80
Model size : ~8Gb
limit/request memory : 20Gb or 30Gb -> OOMKilled after several version model load
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
no limit/request -> Tensorflow ends evicted by Kubernetes because of high memory consumption.
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container tensorserving was using 24861136Ki, which exceeds its request of 0.
Thanks,
Regards
Vince
Upvotes: 4
Views: 1520
Reputation: 111
As workaround I've choose to use a different memory allocator (malloc by default) : tcmalloc (google memory allocation implementation) which resolved my issue without performance issues.
(This is a ugly deployment file but it is for visualization simplification).
Kubernetes deployment tensorflow serving :
spec:
containers:
- name: tensorserving
image: tensorflow/serving:1.14.0-gpu"
command: [""]
args:
- "sh"
- "-c"
- "apt-get update && apt-get install google-perftools -y && LD_PRELOAD=/usr/lib/libtcmalloc.so.4 tensorflow_model_server --port=8500 --rest_api_port=8501 --monitoring_config_file=/etc/config/monitoring_config --enable_batching --batching_parameters_file=/etc/config/batching_parameters --model_config_file=/etc/config/model_config"
Upvotes: 3