Reputation: 8480
I am working on integrating multiprocessing into some digital image processing workflows. The following script 1) converts image bands into numpy arrays, 2) calculates normalized difference vegetation index (NDVI) and 3) converts the numpy array back to raster and writes to disk. The script is designed to get an idea of the speed improvement using multiprocessing. The first section works properly by iterating through a workspace, performing the processing on each raster, and writing to disk (Total time = 2 minutes). The second multiprocessing section "hangs" indefinitely and produces no output. Where am I going wrong with the multiprocessing portion of the script?
import arcpy, os, time
from multiprocessing import Pool
arcpy.env.workspace = r'C:\temp\tiles' # Contains 10 images
outws = r'C:\temp\out'
start = time.time()
rasters = arcpy.ListRasters()
for ras in rasters:
# Calculate NDVI
red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
ndvi = nir - red / nir + red
# Convert array to raster
myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
myRaster.save(os.path.join(outws, "ndvi_" + ras))
end = time.time()
print "%s sec" % (end-start)
#######################################################
start = time.time()
rasters = arcpy.ListRasters()
def process_img(ras):
outws = r'C:\temp\out2'
# Calculate NDVI
red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
ndvi = nir - red / nir + red
# Convert array to raster
myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
myRaster.save(os.path.join(outws, "ndvi_" + ras))
pool = Pool(processes=4)
pool.map(process_img, rasters)
end = time.time()
print "%s sec" % (end-start)
Upvotes: 1
Views: 797
Reputation: 77397
The problem is that on Windows, multiprocessing reloads the script in the child process and that causes all of your top level code to run again... spawning processes into infinity (or hanging completely). Move all of your script code in an if __name__=="__main__":
clause. See Programming Guide for Windows for details.
import arcpy, os, time
from multiprocessing import Pool
def process_img(ras):
outws = r'C:\temp\out2'
# Calculate NDVI
red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
ndvi = nir - red / nir + red
# Convert array to raster
myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
myRaster.save(os.path.join(outws, "ndvi_" + ras))
if __name__=="__main__":
arcpy.env.workspace = r'C:\temp\tiles' # Contains 10 images
outws = r'C:\temp\out'
start = time.time()
rasters = arcpy.ListRasters()
for ras in rasters:
# Calculate NDVI
red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
ndvi = nir - red / nir + red
# Convert array to raster
myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
myRaster.save(os.path.join(outws, "ndvi_" + ras))
end = time.time()
print "%s sec" % (end-start)
#######################################################
start = time.time()
rasters = arcpy.ListRasters()
pool = Pool(processes=4)
pool.map(process_img, rasters)
end = time.time()
print "%s sec" % (end-start)
Upvotes: 3