Parallel computing with Python wrapped C++ classes: performing operations in every element of a vector

Question

Consider C++ classes. The first one is a branch class:

class Branch{
  map properties;
}

i.e. a branch object is only caracterised by its properties, which are stocked in a map. Each property has a name and is associated with a double value. The second one is a tree class, composed of many branches:

class Tree{
  vector<*Branch> tree;

  void addBranch(int index); //adds a branch to tree vector at index
  void removeBranch(int index); //removes branch at index and and its descendents
  double getProperty(int index, string name);//gets value of property name of branch at index
  void addProperty(int index, string name, double value);
  void setProperty(int index, string name, double value);
}

Now assume that the tree class is wrapped using cython. Then, in Python we can manipulate a PyTree object, add and remove branches and manipulate the properties of every branch. Consider the following python program:

tree=PyTree()
for i in range(TotalTime):
  k=random.random()
  if k>0.1:
    tree.addBranch(random_index) #it is not important how we get the index
    tree.addProperty(random_index,'prop1',1)
    tree.addProperty(random_index,'prop2',1)
  k=random.random()
  if k>0.9:
    tree.removeBranch(random_index)
  for j in range(NumberOfBranches): #it's not important how we get the number of branches
    operation1(j,'prop1') # assume this functions were defined 
    operation2(j,'prop2')

In this program I add and remove branches randomly. Each branch has two properties prop1 and prop2. There's an operation1 function performing an operation involving getProperty and setProperty functions on 'prop1', and an operation2 function doing the same on 'prop2'.

What I want is to have one processor(or thread) performing each operation. Since the program is continiously calling an external C++ library, should I use threading instead of multiprocessor?

How should I implement the parallelization? I tried to do it inspiring myself with this https://www.quantstart.com/articles/parallelising-python-with-threading-and-multiprocessing, but when I use both threading or multiprocessor I get a slower program...

DavidW · Accepted Answer

I'd recommend using threading - since the main work is being done in your wrapped C++ functions then you should be able to release the GIL and have everything work in parallel reliably.

A good general rule is to make new threads as few times as possible (this can be a somewhat slow operation) and then feed them data until you're done. You say in the comments that you aren't at all concerned about the order the operations are performed in (good!). With that in mind I'm going to suggest feeling lambda functions containing the operations into a Queue and having the threads pick them off and run time:

The following code then goes inside your for i in range(TotalTime): loop in place of the for j in range(NumberOfBranches)::

q = Queue()

def thread_func():
    """A function for each thread to run.

    It waits to get an item off the queue and then runs that item.
    None is used to indicate that we're done. If we get None, we 
    put it pack on the Queue to ensure every thread terminates"""
    while True:
        f = q.get()
        if f is None:
            q.put(None)
            return     
        f()

# make and start the threads.
# "no_threads" should be something similar to the number of cores you have.
# 4-8 might be a good number to try?
threads = [ threading.Thread(target=thread_func) for n in range(no_threads) ]
[ t.start() for t in  threads ]

# put the required operations on the Queue
for j in (NumberOfBranches):
    # note the awkward syntax to 
    # ensure we capture j: http://stackoverflow.com/a/7514158/4657412
    q.put(lambda j=j: operation1(x,"prop1"))
    q.put(lambda j=j: operation2(x,"prop2"))

q.put(None) # to terminate

# wait for threads to finish
[ t.join() for t in threads ]

In order to allow the threads to actually work in parallel you'll need to ensure that the GIL is released inside your Cython wrapper:

def operation1(a,b):
   with nogil:
      cplusplus_operation1(a,b)

I have two concerns with using multiprocessing

it isn't hugely portable (it works differently on Windows).
if operation1/2 modifies the C++ data then you may find the modified data isn't shared across processes without special effort on your part (which would defeat the point of doing the operation!)

Parallel computing with Python wrapped C++ classes: performing operations in every element of a vector

Answers (1)

Related Questions