SUN JIAWEI
SUN JIAWEI

Reputation: 117

I want to know the size of bounding box in object-detection api

I have used the API

(https://github.com/tensorflow/models/tree/master/object_detection)

And then,

How would I know the length of bounding box?

I have used Tutorial IPython notebook on github in real-time.

But I don't know use which command to calculate the length of boxes.

Upvotes: 9

Views: 14515

Answers (5)

The following code that recognizes objects and returns the information for the locations and confidence is:

(boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})

To iterate through the boxes

 for i,b in enumerate(boxes[0]):

To get width and height:

 width = boxes[0][i][1]+boxes[0][i][3]
 height = boxes[0][i][0]+boxes[0][i][2]

You can find more details: [https://pythonprogramming.net/detecting-distances-self-driving-car/]

Upvotes: 0

ITiger
ITiger

Reputation: 1081

Just to extend Beta's answer:

You can get the predicted bounding boxes from the detection graph. An example for this is given in the Tutorial IPython notebook on github. This is where Beta's code snipped comes from. Access the detection_graph and extract the coordinates of the predicted bounding boxes from the tensor:

By calling np.squeeze(boxes) you reshape them to (m, 4), where m denotes the amount of predicted boxes. You can now access the boxes and compute the length, area or what ever you want.

But remember that the predicted box coordinates are normalized! They are in the following order:

[ymin, xmin, ymax, xmax]

So computing the length in pixel would be something like:

def length_of_bounding_box(bbox):
    return bbox[3]*IMG_WIDTH - bbox[1]*IMG_WIDTH

Upvotes: 12

Gal_M
Gal_M

Reputation: 468

I wrote a full answer on how to find the bounding box coordinates here and thought it might be useful to someone on this thread too.

Google Object Detection API returns bounding boxes in the format [ymin, xmin, ymax, xmax] and in normalised form (full explanation here). To find the (x,y) pixel coordinates we need to multiply the results by width and height of the image. First get the width and height of your image:

width, height = image.size

Then, extract ymin,xmin,ymax,xmax from the boxes object and multiply to get the (x,y) coordinates:

ymin = boxes[0][i][0]*height
xmin = boxes[0][i][1]*width
ymax = boxes[0][i][2]*height
xmax = boxes[0][i][3]*width

Finally print the coordinates of the box corners:

print 'Top left'
print (xmin,ymin,)
print 'Bottom right'
print (xmax,ymax)

Upvotes: 3

KleinYuan
KleinYuan

Reputation: 9

Basically, you can get all those from the graph

image_tensor = graph.get_tensor_by_name('image_tensor:0')
boxes = graph.get_tensor_by_name('detection_boxes:0')
scores = graph.get_tensor_by_name('detection_scores:0')
classes = graph.get_tensor_by_name('detection_classes:0')
num_detections = graph.get_tensor_by_name('num_detections:0')

and boxes[0] contains all predicted bounding box coordinate in format of [top_left_x, top_left_y, bottom_right_x, bottom_right_y], which is what you are looking for.

Check out this repo and you may find more details: https://github.com/KleinYuan/tf-object-detection

Upvotes: 0

Beta
Beta

Reputation: 1756

You can call boxes, like the following:

boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

similarly for scores, and classes.

Then just call them in session run.

(boxes, scores, classes) = sess.run(
              [boxes, scores, classes],
              feed_dict={image_tensor: imageFile})

Upvotes: 1

Related Questions