Using shared array in multiprocessing

Question

I am trying to run a parallel process in python, wherein I have to extract certain polygons from a large array based on some conditions. The large array has 10k+ polygons that are indexed.

In a extract_polygon function I pass (array, index). Based on index the function has to either return the polygon corresponding to that index or not based on the conditions defined. The array is never changed and is only used for reading the polygon based on the index provided.

Since the array is very large, I am running into out of memory error during parallel processing. how can I avoid that? (In a way, how to effectively use shared array in multiprocessing?)

Below is my sample code:

def extract_polygon(array, index):

    try:
        islays = ndimage.find_objects(clone==index)
        poly = clone[islays[0][0],islays[0][1]]
        area = np.count_nonzero(ploy)        

        minArea = 100
        maxArea = 10000

        if (area > minArea) and (area < maxArea):
            return poly
        else:
            return None

    except:
        return None

start = time.time()
pool = mp.Pool(10)
results = pool.starmap(get_objects,[(array, index) for index in indices])
pool.close()
pool.join()

#indices here is a list of all the indexes we have.

Can I use any other library like ray in this case?

Using shared array in multiprocessing

Answers (1)

Related Questions