OpenCV code snippet running slower inside Python multiprocessing process

Question

I was doing some tests with multiprocessing to parallelize face detection and recognition and I came across a strange behaviour, in which detectMultiScale() (that performs the face detection) was running slower inside a child process than in the parent process (just calling the function).

Thus, I wrote the code below in which 10 images are enqueued and then the face detection is performed sequentially with one of two approaches: just calling the detection function or running it inside a single new process. For each detectMultiScale() call, the time of execution is printed. Executing this code gives me an average of 0.22s for each call in the first approach and 0.54s for the second. Also, the total time to process the 10 images is greater in the second approach too.

I don't know why the same code snippet is running slower inside the new process. If only the total time were greater I would understand (considering the overhead of setup a new process), but this I don't get it. For the record, I'm running it in a Raspberry Pi 3B+.

import cv2
import multiprocessing
from time import time, sleep

def detect(face_cascade, img_queue, bnd_queue):
    while True:
        image = img_queue.get()
        if image is not None:
            gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            ti = time()
            ########################################
            faces = face_cascade.detectMultiScale(
                                gray_image,
                                scaleFactor=1.1,
                                minNeighbors=3,
                                minSize=(130, 130))
            ########################################
            tf = time()
            print('det time: ' + str(tf-ti))
                            
            if len(faces) > 0:
                max_bounds = (0,0,0,0)
                max_size = 0
                for (x,y,w,h) in faces:
                     if w*h > max_size:
                         max_size = w*h
                         max_bounds = (x,y,w,h)
            img_queue.task_done()
            bnd_queue.put('bound')
        else:
            img_queue.task_done()
            break


face_cascade = cv2.CascadeClassifier('../lbpcascade_frontalface_improved.xml')
cam = cv2.VideoCapture(0)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, 2592)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, 1944)
cam.set(cv2.CAP_PROP_BUFFERSIZE, 1)

img_queue = multiprocessing.JoinableQueue()

i = 0
while i < 10:
    is_there_frame, image = cam.read()
    if is_there_frame:
        image = image[0:1944, 864:1728]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img_queue.put(image)
        i += 1

bnd_queue = multiprocessing.JoinableQueue()
num_process = 1

ti = time()
# MULTIPROCESSING PROCESS APPROACH
for _ in range(num_process):
    p = multiprocessing.Process(target=detect, args=(face_cascade, img_queue, bnd_queue))
    p.start()

for _ in range(num_process):
    img_queue.put(None)
#     
# FUNCTION CALL APPROACH
#img_queue.put(None)
#while not img_queue.empty():
#    detect(face_cascade, img_queue, bnd_queue)

img_queue.join()
tf = time()

print('TOTAL TIME: ' + str(tf-ti))

while not bnd_queue.empty():
    bound = bnd_queue.get()
    if bound != 'bound':
        print('ERROR')
    bnd_queue.task_done()

OpenCV code snippet running slower inside Python multiprocessing process

Answers (1)

Related Questions