rajas_m
rajas_m

Reputation: 45

How to get coordinates(or even center point) of predicted bounding box in object detection in a video using Tensorflow

I am trying to do a real-time object detection from a video. My model is working fine, but in my final stage I want to print the coordinates of the predicted bounding boxes. Now that I am doing it in a video, I want to print these coordinates continuously. This is the code where the visualization occurs.

vis_util.visualize_boxes_and_labels_on_image_array(
   image_np,
   np.squeeze(boxes),
   np.squeeze(classes).astype(np.int32),
   np.squeeze(scores),
   category_index,
   use_normalized_coordinates=True,
   line_thickness=3,)

I tried doing print(boxes), but it printed many arrays. For example

[0.0000000e+00 0.0000000e+00 6.3184214e-01 3.3072531e-01]
[7.6686603e-01 4.3631636e-02 1.0000000e+00 2.1988428e-01]
[5.0896335e-01 4.1433451e-01 1.0000000e+00 1.0000000e+00]
[1.6146693e-01 5.1699680e-01 8.4259009e-01 9.7707957e-01]
[0.0000000e+00 1.1906862e-02 7.6165682e-01 5.2043945e-01]
[9.5170856e-01 8.5603885e-02 1.0000000e+00 2.2153431e-01]
[4.1733772e-02 8.4026903e-01 3.4761459e-01 9.9046725e-01]
...
...
...
...
and many more

I want these predicted bounding boxes coordinates to get printed in console. I want someone to help me with this.

Upvotes: 2

Views: 1809

Answers (3)

rajas_m
rajas_m

Reputation: 45

I figured out the solution that I was searching for by adding the below piece of code. Anyone who encounters a similar question, you can refer this code.

for i in range(len(boxes)):
   xmin = (int(box[i,0]*width))
   ymin = (int(box[i,1]*height))
   xmax = (int(box[i,2]*width))
   ymax = (int(box[i,3]*height))

I tried doing this and it worked for me. I confirmed it by drawing another rectangle with these coordinates. It overlapped with predicted bounding boxes which confirms about the coordinates.

Upvotes: 1

Jitesh Malipeddi
Jitesh Malipeddi

Reputation: 2385

You can get the bounding box coordinates in the following way

for box in boxes[0]:
    xmin = box[1]*width
    ymin = box[0]*height
    xmax = box[3]*width
    ymax = box[2]*height

where width and height are the width and height of the image respectively They can be obtained by height, width, channels = image_np.shape

Upvotes: 3

Nicolas Gervais
Nicolas Gervais

Reputation: 36584

Somewhere in your config file, there is a parameter called max_boxes_to_predict or something. The TF OD API will always predict this number of boxes. It will also output the confidence of these boxes, most of which should be around 0. It's up to you to choose which of these boxes you keep, based on the detection_scores.

Upvotes: 2

Related Questions