Tensorflow serving OOMKilled or Evicted pod with Kubernetes

Question

I'm deploying tensorflow serving image (tensorflow/serving:latest-gpu) with Kubernetes (GKE) with GPU nodes (K80) hosted on GCP.

command:

command: ["tensorflow_model_server"] args: ["--port=8500", "--rest_api_port=8501", "--enable_batching", "--batching_parameters_file=/etc/config/batching_parameters","--model_config_file=/etc/config/model_config"]

batching parameters :

maxBatchSize: 4096 batchTimeoutMicros: 25000 maxEnqueuedBatches: 16 numBatchThreads: 16

I use --model_config_file to load versions model from a GCS bucket. Tensorflow serving pulls every new version model and load it, when it's done he unload the old one (but it's look like he keeps it in memory)

When I'm using limit/request under the max available resources on host the pod finished OOMKilled because using max memory then allowed. But when I'm using limit/request matching the max available resources on host (dedicated) it's look like the memory is flushed to respect this maximum.

Do you know if we can set max memory to tensorflow or tell him to use cgroup memory limits (use by docker/kubernetes) please ?
Can we fully flush old versions model to release memory ?
Besides, every time I execute a request it increase the memory and never release it. Do you have any idea please ?

Node info :
7 vCPU
30 Gb RAM
1 GPU K80

Model size : ~8Gb

limit/request memory : 20Gb or 30Gb -> OOMKilled after several version model load

Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

no limit/request -> Tensorflow ends evicted by Kubernetes because of high memory consumption.

Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: memory. Container tensorserving was using 24861136Ki, which exceeds its request of 0.

Thanks,

Regards
Vince

VinceDMG · Accepted Answer

As workaround I've choose to use a different memory allocator (malloc by default) : tcmalloc (google memory allocation implementation) which resolved my issue without performance issues.

(This is a ugly deployment file but it is for visualization simplification).
Kubernetes deployment tensorflow serving :

spec:
  containers:
    - name: tensorserving
      image: tensorflow/serving:1.14.0-gpu"
      command: [""]
      args:
        - "sh"
        - "-c"
        - "apt-get update && apt-get install google-perftools -y && LD_PRELOAD=/usr/lib/libtcmalloc.so.4 tensorflow_model_server --port=8500 --rest_api_port=8501 --monitoring_config_file=/etc/config/monitoring_config --enable_batching --batching_parameters_file=/etc/config/batching_parameters --model_config_file=/etc/config/model_config"

Tensorflow serving OOMKilled or Evicted pod with Kubernetes

Answers (1)

Related Questions