SaiBot
SaiBot

Reputation: 3745

Tensorflow Object Detection inference slow on CPU

System information

After training an ssd_inception_v2 model on my custom dataset I wanted to use it for inference. Since the inference should later run on a device without GPU I swithced to CPU only for inference. I adapted the opject_detection_tutorial.ipynb to measure the time for inference and let the following code run on a series of images from a video.

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while success:
      #print(str(datetime.datetime.now().time()) + " " + str(count))
      #read image
      success,image = vidcap.read()
      #resize image
      image = cv2.resize(image , (711, 400))
      # crop image to fit 690 x 400
      image = image[ : , 11:691]
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image, axis=0)
      #print(image_np_expanded.shape)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      before = datetime.datetime.now()
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      print("This took : " + str(datetime.datetime.now() - before))  
      vis_util.visualize_boxes_and_labels_on_image_array(
          image,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

      #cv2.imwrite("converted/frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

With the following output:
This took : 0:00:04.289925
This took : 0:00:00.909071
This took : 0:00:00.917636
This took : 0:00:00.908391
This took : 0:00:00.896601
This took : 0:00:00.908698
This took : 0:00:00.890018
This took : 0:00:00.896373
.....

Of course 900ms per image is not fast enough for video processing. After reading a lot of threads I see two possible ways for improvement:

  1. Graph Transform Tool: In order to get the frozen inference graph faster. (I am hesitating to try this, because as far as I understand I would have to build TF from sources and I am usually happy with the current installation)
  2. Replace Feeding: It seems that feed_dict={image_tensor: image_np_expanded} is not a good way to provide the data to the TF Graph. QueueRunner objects could help here.

So my question is if the above two improvements have the potential to boost the inference to real-time use (10 - 20 fps), or am I on the wrong path here and should try something else? Any suggestions are welcome.

Upvotes: 4

Views: 4663

Answers (1)

dragon7
dragon7

Reputation: 1123

Another option is to utilize a different toolkit for the inference, such as OpenVINO. OpenVINO is designed for Intel hardware, although it should work with any CPU. It improves your model's accuracy by converting it to Intermediate Representation (IR), conducting graph pruning, and fusing certain operations into others. Then, in runtime, it uses vectorization.

It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[tensorflow2]

Use Model Optimizer to convert SavedModel model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Upvotes: 1

Related Questions