Davide Dal Bosco
Davide Dal Bosco

Reputation: 117

Suggestion to vectorize a Python function

I wrote the following function, which takes as inputs three 1D array (namely int_array, x, and y) and a number lim. The output is a number as well.

def integrate_to_lim(int_array, x, y, lim):
    if lim >= np.max(x):
        res = 0.0
    if lim <= np.min(x):
        res = int_array[0]
    else:
        index = np.argmax(x > lim)    # To find the first element of x larger than lim
        partial = int_array[index]
        slope = (y[index-1] - y[index]) / (x[index-1] - x[index])
        rest = (x[index] - lim) * (y[index] + (lim - x[index]) * slope / 2.0)
        res = partial + rest
    return res

Basically, outside form the limit cases lim>=np.max(x) and lim<=np.min(x), the idea is that the function finds the index of the first value of the array x larger than lim and then uses it to make some simple calculations.

In my case, however lim can also be a fairly big 2D array (shape ~2000 times ~1000 elements)

I would like to rewrite it such that it makes the same calculations for the case that lim is a 2D array. Obviously, the output should also be a 2D array of the same shape of lim.

I am having a real struggle figuring out how to vectorize it. I would like to stick only to the numpy package.

PS I want to vectorize my function because efficiency is important and as I understand using for loops is not a good choice in this regard.

Edit: my attempt

I was not aware of the function np.take, which made the task way easier. Here is my brutal attempt that seems to work (suggestions on how to clean up or to make the code faster are more than welcome).

def integrate_to_lim_vect(int_array, x, y, lim_mat):
    lim_mat = np.asarray(lim_mat)    # Make sure that it is an array
    
    shape_3d = list(lim_mat.shape) + [1]
    x_3d = np.ones(shape_3d) * x    # 3 dimensional version of x
    lim_3d = np.expand_dims(lim_mat, axis=2) * np.ones(x_3d.shape)   # also 3d
    
    # I use np.argmax on the 3d matrices (is there a simpler way?)
    index_mat = np.argmax(x_3d > lim_3d, axis=2)
    
    # Silly calculations
    partial = np.take(int_array, index_mat)
    y1_mat = np.take(y, index_mat)
    y2_mat = np.take(y, index_mat - 1)
    x1_mat = np.take(x, index_mat)
    x2_mat = np.take(x, index_mat - 1)
    slope = (y1_mat - y2_mat) / (x1_mat - x2_mat)
    rest = (x1_mat - lim_mat) * (y1_mat + (lim_mat - x1_mat) * slope / 2.0)
    res = partial + rest
    
    # Make the cases with np.select
    condlist = [lim_mat >= np.max(x), lim_mat <= np.min(x)]
    choicelist = [0.0, int_array[0]]    # Shoud these options be a 2d matrix?
    output = np.select(condlist, choicelist, default=res)
    return output 

I am aware that if the limit is larger than the maximum value in the array np.argmax returns the index zero (leading to wrong results). This is why I used np.select to check and correct for these cases.

Is it necessary to define the three dimensional matrices x_3d and lim_3d, or there is a simpler way to find the 2D matrix of the indices index_mat?

Suggestions, especially to improve the way I expanded the dimension of the arrays, are welcome.

Upvotes: 0

Views: 75

Answers (1)

VBB
VBB

Reputation: 1325

I think you can solve this using two tricks. First, a 2d array can be easily flattened to a 1d array, and then your answers can be converted back into a 2d array with reshape.

Next, your use of argmax suggests that your array is sorted. Then you can find your full set of indices using digitize. Thus instead of a single index, you will get a complete array of indices. All the calculations you are doing are intrinsically supported as array operations in numpy, so that should not cause any problems.

You will have to specifically look at the limiting cases. If those are rare enough, then it might be okay to let the answers be derived by the default formula (they will be garbage values), and then replace them with the actual values you desire.

Upvotes: 1

Related Questions