Reputation: 9363
I have a for loop, which uses some binary conditions and finally writes a file accordingly. The problem I have is, the conditions are true for many files (sometimes around 1000 files need to be written). So writing them takes a long time (around 10 mins). I know I can somehow use Python's multiprocessing
and utilise some of the cores.
This is the code that works, but only uses one core.
for i,n in enumerate(halo_param.strip()):
mask = var1['halo_id'] == n
newtbdata = tbdata1[mask]
hdu = pyfits.BinTableHDU(newtbdata)
hdu.writeto(('/home/Documments/file_{0}.fits').format(i))
I came across that it can be done using Pool
from multiprocessing
.
if __name__ == '__main__':
pool = Pool(processes=4)
I would like to know how to do it and utilise atleast 4 of my cores.
Upvotes: 0
Views: 189
Reputation: 368914
Restructure the for loop body as a function, and use Pool.map
with the function.
def work(arg):
i, n = arg
mask = var1['halo_id'] == n
newtbdata = tbdata1[mask]
hdu = pyfits.BinTableHDU(newtbdata)
hdu.writeto(('/home/Documments/file_{0}.fits').format(i))
if __name__ == '__main__':
pool = Pool(processes=4)
pool.map(work, enumerate(halo_param.strip()))
pool.close()
pool.join()
Upvotes: 1