How could parallelize a code with Numba, where an array must be filled by different shaped arrays produced in each loop

Question

I have written a code which is working correctly. I have used numba njit to make it faster. I was curious to know if this code can be parallelized to be as fast as possible. As it is obvious, ints have different numbers, so the inner loop will produce lists with different lengths in each main loop. Due to this issue, zero arrays could not be used as usual (to be filled in loops) and should be as type object; But, object type will not be useful in this regard. I would like to know what is the most efficient applicable way to parallel this code using numba in python and know if there is any limitation for doing so. The reproducible example is as follows:

n = 10
a = np.random.rand(n, 2)
a_min = a - 0.7           # 0.7 is just an arbitrary float to create a lower limits
a_max = a + 0.7
ints = np.random.randint(17, 21, n, dtype=np.int32)

@nb.njit("float64[:, ::1], float64[:, ::1], int32[::1]")
def rand_(a_min, a_max, ints):
    np.random.seed(85)
    main_ = []                       # np.zeros((len(ints),), dtype=object)
    for i, j in enumerate(ints):
        min_x, min_y = a_min[i]
        max_x, max_y = a_max[i]
        temp_ = []                   # np.zeros((j, 2), dtype=nb.float64)
        for m in range(j):
            rand_x = np.random.uniform(min_x, max_x, 1)
            rand_y = np.random.uniform(min_y, max_y, 1)
            temp_.append([rand_x, rand_y])        

        main_.append(temp_)          # temp_ shape: (j, 2); j will be different in each main loop
    
    return main_

Will parallelization affect reproducibility of the results since np.random.uniform is used?

Explanation on the process:
a_min is an array (shape (n*2)) which contain lower limit values of x, y in each row, and a_max the upper limit values. In each loop, we the inner loop will produce a random float between lower and upper limits for each of x, y and stored the two float as a new combination ((x, y)) in an list or array. Based on the integer value in ints (e.g. 9), some of these combinations will be created in each main loop and will be grouped and stored into another list or array.

aerobiomat · Accepted Answer

Your code produces a list of lists of list of arrays, where each array contains a single floating point value.

Instead, it would be reasonable to produce a list of 2d arrays. Also, you can create each column in the array using a single call to np.random.uniform():

def rand_2(a_min, a_max, ints):
    np.random.seed(85)
    main_ = []
    for i, length in enumerate(ints):
        min_x, min_y = a_min[i]
        max_x, max_y = a_max[i]
        temp_ = np.empty((length, 2))
        temp_[:, 0] = np.random.uniform(min_x, max_x, length)
        temp_[:, 1] = np.random.uniform(min_y, max_y, length)
        main_.append(temp_)
    return main_

This code, without Numba jit, is twice as fast as the original, using n=100000. With jit, it is around 20x faster than the original. Due to the different use of the random generator, results are not the same as the original, but they will be consistent between different calls.

Then you can replace the Python list (Numba "reflected list") by a Numba typed List:

Array = nb.types.Array(dtype=nb.float64, ndim=2, layout="C")

@nb.njit
def rand_4(a_min, a_max, ints):
    np.random.seed(85)
    main_ = nb.typed.List.empty_list(Array)     # Typed List
    for i, length in enumerate(ints):
        min_x, min_y = a_min[i]
        max_x, max_y = a_max[i]
        temp_ = np.empty((length, 2))
        temp_[:, 0] = np.random.uniform(min_x, max_x, length)
        temp_[:, 1] = np.random.uniform(min_y, max_y, length)
        main_.append(temp_)
    return main_

This is now around 50x faster than the original.

The loop can be parallelized, but the list's append method is no longer usable because it's not thread-safe. So the list has to be pre-allocated. As long as it's a typed list, it can't be pre-allocated as [None]*n:

dummy_array = np.array([[0.]])

@nb.njit(parallel=True)
def rand_5(a_min, a_max, ints):
    np.random.seed(85)
    n = len(ints)
    main_ = nb.typed.List([dummy_array] * n)
    for i in nb.prange(n):                     # Parallel loop
        length = ints[i]
        min_x, min_y = a_min[i]
        max_x, max_y = a_max[i]
        temp_ = np.empty((length, 2))
        temp_[:, 0] = np.random.uniform(min_x, max_x, length)
        temp_[:, 1] = np.random.uniform(min_y, max_y, length)
        main_[i] = temp_                       # Avoiding append
    return main_

This is only marginally faster than the previous version when ints is very large. Also, results will never be consistent between different calls.

How could parallelize a code with Numba, where an array must be filled by different shaped arrays produced in each loop

Answers (1)

Related Questions