Srivatsan
Srivatsan

Reputation: 9363

Use multiprocessing for a for loop, Python

I have a for loop, which uses some binary conditions and finally writes a file accordingly. The problem I have is, the conditions are true for many files (sometimes around 1000 files need to be written). So writing them takes a long time (around 10 mins). I know I can somehow use Python's multiprocessing and utilise some of the cores.

This is the code that works, but only uses one core.

for i,n in enumerate(halo_param.strip()):
    mask = var1['halo_id'] == n
    newtbdata = tbdata1[mask]
    hdu = pyfits.BinTableHDU(newtbdata)
    hdu.writeto(('/home/Documments/file_{0}.fits').format(i))

I came across that it can be done using Pool from multiprocessing.

if __name__ == '__main__': pool = Pool(processes=4)

I would like to know how to do it and utilise atleast 4 of my cores.

Upvotes: 0

Views: 189

Answers (1)

falsetru
falsetru

Reputation: 368914

Restructure the for loop body as a function, and use Pool.map with the function.

def work(arg):
    i, n = arg
    mask = var1['halo_id'] == n
    newtbdata = tbdata1[mask]
    hdu = pyfits.BinTableHDU(newtbdata)
    hdu.writeto(('/home/Documments/file_{0}.fits').format(i))

if __name__ == '__main__':
    pool = Pool(processes=4)
    pool.map(work, enumerate(halo_param.strip()))
    pool.close()
    pool.join()

Upvotes: 1

Related Questions