cv576
cv576

Reputation: 1

delay of shown images when using cv2.CascadeClassifier() in real-time

I am working on a ROS project for tello drones and I use this driver. When I am just subscribing to CompressedImage messages from the drone camera and display the images on screen I have no problems, everything is working fine.

But as soon as I try to use face detection with cv2.CascadeClassifier, the frames get a huge delay of about 30 seconds in real-time. So, the images are only displayed on the screen about 30 seconds later. Does anyone have an idea how this delay can be minimized for good results in real-time?

Here is the code so far:

#!/usr/bin/env python

import rospy
from sensor_msgs.msg import CompressedImage
import av
import cv2
import numpy
import threading
import traceback


class StandaloneVideoStream(object):
    def __init__(self):
        self.cond = threading.Condition()
        self.queue = []
        self.closed = False

    def read(self, size):
        self.cond.acquire()
        try:
            if len(self.queue) == 0 and not self.closed:
                self.cond.wait(2.0)
            data = bytes()
            while 0 < len(self.queue) and len(data) + len(self.queue[0]) < size:
                data = data + self.queue[0]
                del self.queue[0]
        finally:
            self.cond.release()
        return data

    def seek(self, offset, whence):
        return -1

    def close(self):
        self.cond.acquire()
        self.queue = []
        self.closed = True
        self.cond.notifyAll()
        self.cond.release()

    def add_frame(self, buf):
        self.cond.acquire()
        self.queue.append(buf)
        self.cond.notifyAll()
        self.cond.release()


stream = StandaloneVideoStream()


def callback(msg):
    stream.add_frame(msg.data)


def main():
    rospy.init_node('face_detection')

    rospy.Subscriber('/tello/image_raw/h264', CompressedImage, callback)

    container = av.open(stream)

    for frame in container.decode(video=0):
        image_msg = cv2.cvtColor(numpy.array(frame.to_image()), cv2.COLOR_RGB2BGR) 

        stop_data = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
        found = stop_data.detectMultiScale(image_msg, minSize =(20, 20))

        amount_found = len(found)

        if amount_found != 0:
            for (x, y, width, height) in found:  
                cv2.rectangle(image_msg, (x, y), (x + height, y + width), (0, 255, 0), 5)

        cv2.imshow('Frame', image_msg)
        cv2.waitKey(1)


if __name__ == '__main__':
    try:
        main()
    except BaseException:
        traceback.print_exc()
    finally:
        stream.close()
        cv2.destroyAllWindows()

EDIT:

When I print out the shape of the images (image_msg) then I get the dimension of (720, 960, 3) height, width and 3 channels

This shows the size of the stream in bytes

...
    Tello: 15:54:16.106:  Info: video data 599118 bytes 290.2KB/sec
    Tello: 15:54:18.106:  Info: video data 502212 bytes 245.2KB/sec
    Tello: 15:54:20.108:  Info: video data 503748 bytes 245.7KB/sec
    Tello: 15:54:22.109:  Info: video data 503182 bytes 245.6KB/sec
    Tello: 15:54:22.446:  Info: video recv: 1460 bytes 1b00 +103
    Tello: 15:54:22.813:  Info: video recv: 1460 bytes 2400 +173
    Tello: 15:54:23.190:  Info: video recv: 1460 bytes 2f00 +177
    Tello: 15:54:23.554:  Info: video recv: 1460 bytes 3a00 +178
    Tello: 15:54:23.918:  Info: video recv: 1460 bytes 4500 +176
    Tello: 15:54:24.268:  Info: video recv: 1460 bytes 5000 +160
    Tello: 15:54:24.268:  Info: video data 502157 bytes 227.1KB/sec
    Tello: 15:54:24.585:  Info: video recv: 1460 bytes 5c00 +140
    Tello: 15:54:24.917:  Info: video recv: 1460 bytes 6600 +142
    Tello: 15:54:25.266:  Info: video recv: 1460 bytes 7000 +157
    Tello: 15:54:25.545:  Info: video recv: 1460 bytes 7a00 +102
    Tello: 15:54:25.878:  Info: video recv: 1460 bytes 8201 +140
    Tello: 15:54:26.178:  Info: video recv: 1460 bytes 8d00 +102
    Tello: 15:54:26.271:  Info: video data 534194 bytes 260.5KB/sec
...

Upvotes: 0

Views: 197

Answers (1)

JWCS
JWCS

Reputation: 1211

If this code as it is is 30s loop time, and with the "stop_data =" and "found = " lines commented out is significantly faster, then that's the bottleneck. You have 3 options (by severity): 1) change the parameters, 2) change the input data, 3) change the algorithm. I'm assuming you've tried (1) changing the parameters and you don't (3) want to change the algorithm, so your only choice is to (2) change the input data.

Try downsampling your image to something like 240x360 or 480x720. It should be decently faster. You can use cv2.pyrDown() (c++ doc) (Python ex), a gaussian smoothing downsample, which will keep the image smoother than a simple pick-every-nth-pixel downsample.

Upvotes: 0

Related Questions