Marcos Monteiro
Marcos Monteiro

Reputation: 23

OpenCV code snippet running slower inside Python multiprocessing process

I was doing some tests with multiprocessing to parallelize face detection and recognition and I came across a strange behaviour, in which detectMultiScale() (that performs the face detection) was running slower inside a child process than in the parent process (just calling the function).

Thus, I wrote the code below in which 10 images are enqueued and then the face detection is performed sequentially with one of two approaches: just calling the detection function or running it inside a single new process. For each detectMultiScale() call, the time of execution is printed. Executing this code gives me an average of 0.22s for each call in the first approach and 0.54s for the second. Also, the total time to process the 10 images is greater in the second approach too.

I don't know why the same code snippet is running slower inside the new process. If only the total time were greater I would understand (considering the overhead of setup a new process), but this I don't get it. For the record, I'm running it in a Raspberry Pi 3B+.

import cv2
import multiprocessing
from time import time, sleep

def detect(face_cascade, img_queue, bnd_queue):
    while True:
        image = img_queue.get()
        if image is not None:
            gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            ti = time()
            ########################################
            faces = face_cascade.detectMultiScale(
                                gray_image,
                                scaleFactor=1.1,
                                minNeighbors=3,
                                minSize=(130, 130))
            ########################################
            tf = time()
            print('det time: ' + str(tf-ti))
                            
            if len(faces) > 0:
                max_bounds = (0,0,0,0)
                max_size = 0
                for (x,y,w,h) in faces:
                     if w*h > max_size:
                         max_size = w*h
                         max_bounds = (x,y,w,h)
            img_queue.task_done()
            bnd_queue.put('bound')
        else:
            img_queue.task_done()
            break


face_cascade = cv2.CascadeClassifier('../lbpcascade_frontalface_improved.xml')
cam = cv2.VideoCapture(0)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, 2592)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, 1944)
cam.set(cv2.CAP_PROP_BUFFERSIZE, 1)

img_queue = multiprocessing.JoinableQueue()

i = 0
while i < 10:
    is_there_frame, image = cam.read()
    if is_there_frame:
        image = image[0:1944, 864:1728]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img_queue.put(image)
        i += 1

bnd_queue = multiprocessing.JoinableQueue()
num_process = 1

ti = time()
# MULTIPROCESSING PROCESS APPROACH
for _ in range(num_process):
    p = multiprocessing.Process(target=detect, args=(face_cascade, img_queue, bnd_queue))
    p.start()

for _ in range(num_process):
    img_queue.put(None)
#     
# FUNCTION CALL APPROACH
#img_queue.put(None)
#while not img_queue.empty():
#    detect(face_cascade, img_queue, bnd_queue)

img_queue.join()
tf = time()

print('TOTAL TIME: ' + str(tf-ti))

while not bnd_queue.empty():
    bound = bnd_queue.get()
    if bound != 'bound':
        print('ERROR')
    bnd_queue.task_done()

Upvotes: 0

Views: 337

Answers (1)

Samar Fatima
Samar Fatima

Reputation: 63

I am having same issue and I think the reason is that tasks is somewhat I/O bound and also the overhead created by multiprocessing itself. You can also read the article here https://www.pyimagesearch.com/2019/09/09/multiprocessing-with-opencv-and-python/ And the problem you mentioned specifically with detectMultiScale() method is same as mine. I have also tried using serialize and making variables global and also of class level but nothing help..

Upvotes: 1

Related Questions