rhy
rhy

Reputation: 1

Using about pool.apply_async

I am new to python, and I want to use pool.apply_async() to calibrate my code. The parameters of pool.apply_async() confused me.

Here is my code:

def detect(i, pdf):
    savefig2pdf.save(event['value'][0][5000:6000],
                 event['value'][1][5000:6000],
                 event['value'][2][5000:6000],
                 event['start point index']+5000 ,
                 eventlist[i],
                 p_result,
                 s_arrival,
                 pdf)"

if __name__ == '__main__':
    pdf = PdfPages('cut_figure.pdf')
    pool = multiprocessing.Pool(processes=10)    # set the processes max number 10
    for i in range(0, len(eventlist)):
        pool.apply_async(detect, (i, pdf,))
    pool.close()
    pool.join()
    pdf.close()

If I only pass the i, it works. How can I also pass the pdf to processes? I need the pdf to be able to write until all the process is done. Thanks for your help.

Upvotes: 0

Views: 152

Answers (1)

Thomas Moreau
Thomas Moreau

Reputation: 4467

The multiprocessing module relies on pickle to serialize the object you pass between your functions. But you cannot pickle the pdf object:

>>> from matplotlib.backends.backend_pdf import PdfPages
>>> import pickle
>>> pdf = PdfPages('cut_figure.pdf')
>>> pickle.dumps(pdf)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-e06adaa58666> in <module>()
----> 1 pickle.dumps(pdf)

TypeError: cannot serialize '_io.BufferedWriter' object

So it is not possible to use multiprocessing with a single pdf object. You can try using threading to get multi threaded execution, as your program seems to be IO bound (you spend a lot of time writing to a file).

Upvotes: 2

Related Questions