Burschken
Burschken

Reputation: 87

Tensorflow Object-API: convert ssd model to tflite and use it in python

I have a hard time to convert a given tensorflow model into a tflite model and then use it. I already posted a question where I described my problem but didn't share the model I was working with, because I am not allowed to. Since I didn't find an answer this way, I tried to convert a public model (ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu).

Here is a colab tutorial from the object detection api. I just run the whole script without changes (its the same model) and downloaded the generated models (with and without metadata). I uploaded them here together with a sample picture from the coco17 train dataset.

I tried to use those models directly in python, but the results feel like garbage.

Here is the code I used, I followed this guide. I changed the indexes for rects, scores and classes because otherwise the results were not in the right format.

#interpreter = tf.lite.Interpreter("original_models/model.tflite")
interpreter = tf.lite.Interpreter("original_models/model_with_metadata.tflite")

interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

size = 640

def draw_rect(image, box):
    y_min = int(max(1, (box[0] * size)))
    x_min = int(max(1, (box[1] * size)))
    y_max = int(min(size, (box[2] * size)))
    x_max = int(min(size, (box[3] * size)))
    
    # draw a rectangle on the image
    cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (255, 255, 255), 2)

file = "images/000000000034.jpg"


img = cv2.imread(file)
new_img = cv2.resize(img, (size, size))
new_img = cv2.cvtColor(new_img, cv2.COLOR_BGR2RGB)

interpreter.set_tensor(input_details[0]['index'], [new_img.astype("f")])

interpreter.invoke()
rects = interpreter.get_tensor(
    output_details[1]['index'])

scores = interpreter.get_tensor(
    output_details[0]['index'])

classes = interpreter.get_tensor(
    output_details[3]['index'])


for index, score in enumerate(scores[0]):
        draw_rect(new_img,rects[0][index])
        #print(rects[0][index])
        print("scores: ",scores[0][index])
        print("class id: ", classes[0][index])
        print("______________________________")


cv2.imshow("image", new_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

This leads to the following console output

scores:  0.20041436
class id:  51.0
______________________________
scores:  0.08925027
class id:  34.0
______________________________
scores:  0.079722285
class id:  34.0
______________________________
scores:  0.06676647
class id:  71.0
______________________________
scores:  0.06626186
class id:  15.0
______________________________
scores:  0.059938848
class id:  86.0
______________________________
scores:  0.058229476
class id:  34.0
______________________________
scores:  0.053791136
class id:  37.0
______________________________
scores:  0.053478718
class id:  15.0
______________________________
scores:  0.052847564
class id:  43.0
______________________________

and the resulting image

model output.

I tried different images from the orinal training dataset and never got good results. I think the output layer is broken or maybe some postprocessing is missing?

I also tried to use the converting method given from the offical tensorflow documentaion.

import tensorflow as tf

saved_model_dir = 'tf_models/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/'
    # Convert the model
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory
tflite_model = converter.convert()
    
# Save the model.
with open('model.tflite', 'wb') as f:
      f.write(tflite_model)

But when I try to use the model, I get a ValueError: Cannot set tensor: Dimension mismatch. Got 640 but expected 1 for dimension 1 of input 0.

Has anyone an idea what I am doing wrong?

Update: After Farmmakers advice, I tried changing the input dimensions of the model generating by the short script at the end. The shape before was:

[{'name': 'serving_default_input_tensor:0',
  'index': 0,
  'shape': array([1, 1, 1, 3], dtype=int32),
  'shape_signature': array([ 1, -1, -1,  3], dtype=int32),
  'dtype': numpy.uint8,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

So adding one dimension would not be enough. Therefore I used interpreter.resize_tensor_input(0, [1,640,640,3]) . Now it works to feed an image through the net.

Unfortunately I sill can't make any sense of the output. Here is the print of the output details:

[{'name': 'StatefulPartitionedCall:6',
  'index': 473,
  'shape': array([    1, 51150,     4], dtype=int32),
  'shape_signature': array([    1, 51150,     4], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:0',
  'index': 2233,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:5',
  'index': 2198,
  'shape': array([1], dtype=int32),
  'shape_signature': array([1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:7',
  'index': 493,
  'shape': array([    1, 51150,    91], dtype=int32),
  'shape_signature': array([    1, 51150,    91], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:1',
  'index': 2286,
  'shape': array([1, 1, 1], dtype=int32),
  'shape_signature': array([ 1, -1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:2',
  'index': 2268,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:4',
  'index': 2215,
  'shape': array([1, 1], dtype=int32),
  'shape_signature': array([ 1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'StatefulPartitionedCall:3',
  'index': 2251,
  'shape': array([1, 1, 1], dtype=int32),
  'shape_signature': array([ 1, -1, -1], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]  

I added the so generated tflite model to the google drive.

Update2: I added a directory to the google drive which contains a notebook that uses the full size model and produces the correct output. If you execute the whole notebook it should produce the following image to your disk.

enter image description here

Upvotes: 0

Views: 1382

Answers (2)

cricket
cricket

Reputation: 31

I followed the exact procedure you show (standard procedure mentioned in tensorflow doc).

First the output returned by the tflite model, unlike described in the official documentation, has a different format (different indexing).

  boxes = get_output_tensor(interpreter, 1)
  classes = get_output_tensor(interpreter, 3)
  scores = get_output_tensor(interpreter, 0)
  count = int(get_output_tensor(interpreter, 2))

Second, the number of retuned bounding boxes are always 10, and I can't figure out how to change that to the custom number of objects in my dataset.

Finally, the way I solved it is just by retrieving the bounding boxes using index 1, and filtering them out using the scores. However, the results I get are far from the original model's. Moreover, the tflite model takes way more time than the original model, opposite of what tflite is meant for. Probably, because I run it on my laptop, therefore x86 instruction set (tflite is optimize to run on ARM CPUs instead (mobile, raspberry pi)).

Upvotes: 1

Taehee Jeong
Taehee Jeong

Reputation: 146

For the models from Object Detection APIs to work well with TFLite, you have to convert it to TFLite-friendly graph that has custom op.

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md

(TF1 doc)

You can also try using TensorFlow Lite Model Maker

Upvotes: 1

Related Questions