Faster iteration over 2D Numpy/CuPy arrays based on unique values

Question

I am currently looping over a numpy array to slice it and do some ndarray array. Just the time neeed is currently much to long, due to the sice of the array of 2001*2001 elements. Therefore I hope, that someone might surgest a hint, how to speedup the code:

import cupy as cp
from time import time

height, width = 187, 746
org_sized = cp.random.rand(2001, 2001) * 60

height_mat = cp.random.rand(height, width) * 100 # orinally values getting larger from (0, width//2) to the outside with the distance squared

indices = cp.indices((height, width))
y_offsets = indices[0]
x_offsets = indices[1] - (width + 1)/2
angle_mat = cp.round_(2*(90 - cp.rad2deg(cp.arctan2(y_offsets, x_offsets))) + 180).astype(int)

weights = cp.random.rand(361)/ 10  # weights oroiginally larger in the middle

# pad the org_sized matrix with zeros to a fit a size of (2001+heigth, 2001+weight)
west = cp.zeros((org_sized.shape[0], width // 2))
east = cp.zeros((org_sized.shape[0], round(width // 2)))

enlarged_size = cp.hstack((west, org_sized))
enlarged_size = cp.hstack((enlarged_size, east))

south = cp.zeros((height, enlarged_size.shape[1]))

enlarged_size = cp.vstack((enlarged_size, south))

shadow_time_hrs = cp.zeros_like(org_sized)


for y in range(org_sized.shape[0]):
    start_time = time()
    for x in range(org_sized.shape[1]):
        # shift h_extras and angles that they match in size, and are correctly aligned
        short_elevations = enlarged_size[y:y+height, x:x+width]

        overshadowed = (short_elevations - org_sized[y, x]) > height_mat
        shadowed_angles = angle_mat * overshadowed
        shadowed_segments = cp.unique(shadowed_angles)
        angle_segments = shadowed_segments

        sum_hours = cp.sum(weights[angle_segments])
        shadow_time_hrs[y, x] = sum_hours
    if (y % 100) == 0:
        print(f"Computation for line {y} took: {time() - start_time}.")

Firstly I used numbas @njit on the function calc_shadow_point, but it turned out, that it was 2 times slower than without. Therefore I switched to the numpy arrays to cupy arrays. Which give an speed-up of about 50 %. Probapy because the arrays are so small.

Are there other ways than to iteratere for this kind of problem, or is there a way to iterate with multi-threading over the iterators?

Edit: I changed the code to a minimum example of the same runtime (1.1 s per line of org_sized). Somehow I have to increase the computation speed. Everything below 10 % of the current computation time would make the code usable. Due to the remarks I changed to np.unique to cp.unique, but as remarked. It didn't result in large speed-up barely 6 %. I am currently using a GTX 1060. But when it would help could manage to use a 1660 Ti.

Faster iteration over 2D Numpy/CuPy arrays based on unique values

Answers (1)

Related Questions