Efficient way to operate over a list of Numpy arrays of different sizes

Question

I am writing a function to bin points based on their angle in a radial coordinate system. I would like to have the option to perform some nonlinear downsampling of the points in each bin (computing the median coordinate, min or max coordinate based on distance). I am able to split my array into views of each bin, but since they vary in size, I have not been able to find a way to operate on each slice while fully utilizing vectorization.

I have achieved my best solution by sorting the points by angle, computing a quantized copy, and identifying the indices between quantized gaps. I then split the sorted array with the index key.

At this point, I would like to be able to compute metrics for each bin without using loops. I can't simply concatenate each slice into a 3D array since they are inhomogeneous. The way I have achieved this so far is by building an array of NaNs of size [num_slices, length_of_largest_slice, 2], and populating the array along axis 0 with each slice, leaving the unindexed portions as NaN, and finally computing my metrics with operations that ignore NaNs. I don't believe this is memory efficient and I assume that populating the array is quite slow.

Example code below:

polar_points = get_polar(points)    # convert points to polar coordinates, sorted by angle
quantized = (polar_points[:,1] // bin_size) # quantize the points to a provided resolution

split_key = np.nonzero(np.diff(quantized))[0] + 1  # compute gap indices
max_size_key = np.append(np.insert(split_key, 0, 0), quantized.shape[0])   # add first and last index for size computation

split_polar = np.split(polar_points, split_key) # split original points at gap indices

dim_0 = len(split_polar)    # get number of split clouds
dim_1 = max(np.diff(max_size_key)) # get size of largest split cloud

reshaped_array = np.full(shape=(dim_0, dim_1, 2), fill_value=np.nan)    # init array for inhomogenous reshaped data

for idx, arr in enumerate(split_polar):
    reshaped_array[idx, :arr.shape[0], :] = arr

if mode=='mean':
    res = np.nanmean(reshaped_array, axis=1)

elif mode == 'median':
    res = np.nanmedian(reshaped_array, axis=1)

elif mode == 'closest':
    min_indices = np.nanargmin(reshaped_array[:, :, 0], axis=-1) # get idx of min r for each bin
    res = reshaped_array[np.arange(dim_0), min_indices, :]

elif mode == 'furthest':
    max_indices = np.nanargmax(reshaped_array[:, :, 0], axis=-1) # get idx of max r for each bin
    res = reshaped_array[np.arange(dim_0), max_indices, :]

return get_cartesian(res)

I'm wondering if numpy.ufunc or numpy.vectorize could be used to solve this? I have seen map used to similar ends, but I'm not sure how efficient this would be compared to a full numpy solution.

Efficient way to operate over a list of Numpy arrays of different sizes

Answers (0)

Related Questions