Puchatek
Puchatek

Reputation: 1537

OpenCV + Python + multiprocessing won't work

I think I found a bug in Python bindings for OpenCV, but since there is always a chance problem exists between the chair and the keyboard and not in the code, I thought to confirm here instead of submitting a ticket right away.

Here is a simple script for processing a bunch of images in parallel:

import cv2
import multiprocessing
import glob
import numpy

def job(path, output):

    image = cv2.imread(path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    output.put(path)

if __name__ == "__main__":

    main_image = cv2.imread("./image.png")
    main_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    output = multiprocessing.Queue()

    processes = []

    for path in glob.glob("./data/*"):

        process = multiprocessing.Process(
            target=job,
            args=(path, output))

        process.start()
        processes.append(process)

    for process in processes:
        process.join()

    # Collect all results
    results = [output.get() for process in processes]

    print 'Finished'

In this code results = [output.get() for process in processes] never finishes. Now the really weird part is that if I comment out the main_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) line, which should have no influence on parallel computations whatsoever, script does finish.

Both ./image.png and paths at ./data/ lead to ordinary images, about ~20 of them in total. I tried creating images in memory (numpy.ones([100, 100, 3]).astype(numpy.float32)) and that didn't produce the bug.

I have similar code written in C++ and it runs just fine. My environment: OS X 10.10, OpenCV 3.0.0, Python 2.7

So, am I doing something silly, or does this indeed appear to be a bug in OpenCV that manifests in parallel computations?


Edit: I also tried an implementation using multiprocessing.Pool.map() and the result is the same. Here's the code

import cv2
import multiprocessing
import glob
import numpy

def job(path):

    image = cv2.imread(path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    return path

if __name__ == "__main__":

    image = cv2.imread("./image.png")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    paths = glob.glob("./data/*")
    pool = multiprocessing.Pool()
    result = pool.map(job, paths)

    print 'Finished'

    for value in result:
        print value

I was able to get correct results for such design with non-opencv tasks, so I strongly believe the problem is on opencv side. But please feel free to prove me wrong - I would love that since it would mean I don't have to resort to C++.

Upvotes: 4

Views: 3993

Answers (1)

spoorcc
spoorcc

Reputation: 2955

Shouldn't you get before you join?

According to the python docs:

Joining processes that use queues

Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the cancel_join_thread() method of the queue to avoid this behaviour.)

This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.

An example which will deadlock is the following:

 from multiprocessing import Process, Queue


 def f(q):
     q.put('X' * 1000000)


 if __name__ == '__main__':
     queue = Queue()
     p = Process(target=f, args=(queue,))
     p.start()
     p.join()                    # this deadlocks
     obj = queue.get() 

A fix here would be to swap the last two lines (or simply remove the p.join() line).

Upvotes: 0

Related Questions