Reputation: 2263
My requirement:
Make the inference task run on GPU for object detection using tensorflow.
Current status:
I am using AWS GPU instance (p2.xlarge) for training as well as for inference. The training part runs well on GPU. No problem here. (Graphics card: Tesla M60)
For getting predictions, I have created a flask server encapsulating the tensorflow detection with some additional logic to it. I am going to deploy this service (Flask + tensorflow) as a docker container. The base image that I am using is tensorflow/tensorflow:1.12.0-gpu-py3
. My dockerfile
looks something like this:
FROM tensorflow/tensorflow:1.12.0-gpu-py3
COPY ./app /app
COPY ./requirements.txt /app
RUN pip3 install -r /app/requirements.txt
RUN mkdir /app/venv
WORKDIR /app
RUN export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
ENTRYPOINT ["python3", "/app/main.py"]
ENV LISTEN_PORT 8080
EXPOSE 8080
I am able to deploy this by:
docker run --runtime=nvidia --gpus all --name <my-long-img-name>
-v <somepath>:<anotherpath> -p 8080:8080 -d <my-long-img-name>
and successfully make calls to the endpoints on port 8080 from postman.
Basically, what I mean is all the drivers are setup properly.
One of the endpoint in flask is like: (For testing if GPU is being used or not)
@app.route("/testgpu", methods=["GET"])
def testgpu():
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
When I call this endpoint I get no errors (If there was no gpu detected it would throw error). This means gpu is detected for this snippet. YAY !!
I also added these 2 lines to my main code execution flow:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
and it outputs:
Local devices :
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17661279486087266140
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9205152708262911170
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 3134142118233627849
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7447009690
locality {
bus_id: 1
links {
}
}
incarnation: 6613138223738633761
physical_device_desc: "device: 0, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2"
]
YAY again, the GPU is detected.
Even the logs from tensorflow is taking GPU.
2019-11-18 08:45:29.944580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-18 08:45:29.944603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-11-18 08:45:29.944611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-11-18 08:45:29.944721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7101 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)
Everything seems smooth here, but the main part where GPU should be running is not taking it. It is using CPU. There is this another endpoint (let's say, /getpredictions
) along with /testgpu
that is mentioned above which runs the detection and returns the output.
The problem:
Whenever I call /getpredictions
from postman on port 8080 instead of using GPU it takes CPU and returns the output in around ~30+ seconds.
Is there anything missing here? Any workarounds?
Let me know if I need to add some more information to the question.
Upvotes: 3
Views: 2609
Reputation: 21
from the docs you should add gpu option when running container like this:
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter
Upvotes: 2