Borealis
Borealis

Reputation: 8480

Python multiprocessing script hangs indefinitely

I am working on integrating multiprocessing into some digital image processing workflows. The following script 1) converts image bands into numpy arrays, 2) calculates normalized difference vegetation index (NDVI) and 3) converts the numpy array back to raster and writes to disk. The script is designed to get an idea of the speed improvement using multiprocessing. The first section works properly by iterating through a workspace, performing the processing on each raster, and writing to disk (Total time = 2 minutes). The second multiprocessing section "hangs" indefinitely and produces no output. Where am I going wrong with the multiprocessing portion of the script?

import arcpy, os, time
from multiprocessing import Pool

arcpy.env.workspace = r'C:\temp\tiles' # Contains 10 images
outws = r'C:\temp\out'

start = time.time()
rasters = arcpy.ListRasters()

for ras in rasters:
    # Calculate NDVI
    red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
    nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
    ndvi = nir - red / nir + red

    # Convert array to raster
    myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
    myRaster.save(os.path.join(outws, "ndvi_" + ras))

end = time.time()

print "%s sec" % (end-start)

#######################################################
start = time.time()
rasters = arcpy.ListRasters()

def process_img(ras):
    outws = r'C:\temp\out2'
    # Calculate NDVI
    red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
    nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
    ndvi = nir - red / nir + red

    # Convert array to raster
    myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
    myRaster.save(os.path.join(outws, "ndvi_" + ras))

pool = Pool(processes=4)
pool.map(process_img, rasters)

end = time.time()

print "%s sec" % (end-start)

Upvotes: 1

Views: 797

Answers (1)

tdelaney
tdelaney

Reputation: 77397

The problem is that on Windows, multiprocessing reloads the script in the child process and that causes all of your top level code to run again... spawning processes into infinity (or hanging completely). Move all of your script code in an if __name__=="__main__": clause. See Programming Guide for Windows for details.

import arcpy, os, time
from multiprocessing import Pool

def process_img(ras):
    outws = r'C:\temp\out2'
    # Calculate NDVI
    red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
    nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
    ndvi = nir - red / nir + red

    # Convert array to raster
    myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
    myRaster.save(os.path.join(outws, "ndvi_" + ras))

if __name__=="__main__":
    arcpy.env.workspace = r'C:\temp\tiles' # Contains 10 images
    outws = r'C:\temp\out'

    start = time.time()
    rasters = arcpy.ListRasters()

    for ras in rasters:
        # Calculate NDVI
        red = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_1"))
        nir = arcpy.RasterToNumPyArray(os.path.join(ras + "/" + "Band_4"))
        ndvi = nir - red / nir + red

        # Convert array to raster
        myRaster = arcpy.NumPyArrayToRaster(ndvi,x_cell_size=0.5)
        myRaster.save(os.path.join(outws, "ndvi_" + ras))

    end = time.time()

    print "%s sec" % (end-start)

    #######################################################
    start = time.time()
    rasters = arcpy.ListRasters()


    pool = Pool(processes=4)
    pool.map(process_img, rasters)

    end = time.time()

    print "%s sec" % (end-start)

Upvotes: 3

Related Questions