Tensorflow docker image not using GPU for inference/predictions

Question

My requirement:

Make the inference task run on GPU for object detection using tensorflow.

Current status:

I am using AWS GPU instance (p2.xlarge) for training as well as for inference. The training part runs well on GPU. No problem here. (Graphics card: Tesla M60)

For getting predictions, I have created a flask server encapsulating the tensorflow detection with some additional logic to it. I am going to deploy this service (Flask + tensorflow) as a docker container. The base image that I am using is tensorflow/tensorflow:1.12.0-gpu-py3. My dockerfile looks something like this:

FROM tensorflow/tensorflow:1.12.0-gpu-py3
COPY ./app /app
COPY ./requirements.txt /app
RUN pip3 install -r /app/requirements.txt
RUN mkdir /app/venv
WORKDIR /app
RUN export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
ENTRYPOINT ["python3", "/app/main.py"]
ENV LISTEN_PORT 8080
EXPOSE 8080

I am able to deploy this by:

docker run --runtime=nvidia --gpus all --name  
-v : -p 8080:8080 -d

and successfully make calls to the endpoints on port 8080 from postman.

Basically, what I mean is all the drivers are setup properly.

One of the endpoint in flask is like: (For testing if GPU is being used or not)

@app.route("/testgpu", methods=["GET"])
def testgpu():
    import tensorflow as tf
    with tf.device('/gpu:0'):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
        c = tf.matmul(a, b)

    with tf.Session() as sess:
        print (sess.run(c))

When I call this endpoint I get no errors (If there was no gpu detected it would throw error). This means gpu is detected for this snippet. YAY !!

I also added these 2 lines to my main code execution flow:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

and it outputs:

Local devices : 
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17661279486087266140
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9205152708262911170
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 3134142118233627849
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7447009690
locality {
  bus_id: 1
  links {
  }
}
incarnation: 6613138223738633761
physical_device_desc: "device: 0, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2"
]

YAY again, the GPU is detected.

Even the logs from tensorflow is taking GPU.

2019-11-18 08:45:29.944580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-18 08:45:29.944603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-11-18 08:45:29.944611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-11-18 08:45:29.944721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7101 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)

Everything seems smooth here, but the main part where GPU should be running is not taking it. It is using CPU. There is this another endpoint (let's say, /getpredictions) along with /testgpu that is mentioned above which runs the detection and returns the output.

The problem: Whenever I call /getpredictions from postman on port 8080 instead of using GPU it takes CPU and returns the output in around ~30+ seconds.

Is there anything missing here? Any workarounds?

Let me know if I need to add some more information to the question.

Tensorflow docker image not using GPU for inference/predictions

Answers (1)

Related Questions