Reputation: 117
I wrote the following function, which takes as inputs three 1D array (namely int_array
, x
, and y
) and a number lim
. The output is a number as well.
def integrate_to_lim(int_array, x, y, lim):
if lim >= np.max(x):
res = 0.0
if lim <= np.min(x):
res = int_array[0]
else:
index = np.argmax(x > lim) # To find the first element of x larger than lim
partial = int_array[index]
slope = (y[index-1] - y[index]) / (x[index-1] - x[index])
rest = (x[index] - lim) * (y[index] + (lim - x[index]) * slope / 2.0)
res = partial + rest
return res
Basically, outside form the limit cases lim>=np.max(x)
and lim<=np.min(x)
, the idea is that the function finds the index of the first value of the array x
larger than lim
and then uses it to make some simple calculations.
In my case, however lim
can also be a fairly big 2D array (shape ~2000 times ~1000 elements)
I would like to rewrite it such that it makes the same calculations for the case that lim
is a 2D array.
Obviously, the output should also be a 2D array of the same shape of lim
.
I am having a real struggle figuring out how to vectorize it.
I would like to stick only to the numpy
package.
PS I want to vectorize my function because efficiency is important and as I understand using for loops is not a good choice in this regard.
I was not aware of the function np.take
, which made the task way easier.
Here is my brutal attempt that seems to work (suggestions on how to clean up or to make the code faster are more than welcome).
def integrate_to_lim_vect(int_array, x, y, lim_mat):
lim_mat = np.asarray(lim_mat) # Make sure that it is an array
shape_3d = list(lim_mat.shape) + [1]
x_3d = np.ones(shape_3d) * x # 3 dimensional version of x
lim_3d = np.expand_dims(lim_mat, axis=2) * np.ones(x_3d.shape) # also 3d
# I use np.argmax on the 3d matrices (is there a simpler way?)
index_mat = np.argmax(x_3d > lim_3d, axis=2)
# Silly calculations
partial = np.take(int_array, index_mat)
y1_mat = np.take(y, index_mat)
y2_mat = np.take(y, index_mat - 1)
x1_mat = np.take(x, index_mat)
x2_mat = np.take(x, index_mat - 1)
slope = (y1_mat - y2_mat) / (x1_mat - x2_mat)
rest = (x1_mat - lim_mat) * (y1_mat + (lim_mat - x1_mat) * slope / 2.0)
res = partial + rest
# Make the cases with np.select
condlist = [lim_mat >= np.max(x), lim_mat <= np.min(x)]
choicelist = [0.0, int_array[0]] # Shoud these options be a 2d matrix?
output = np.select(condlist, choicelist, default=res)
return output
I am aware that if the limit is larger than the maximum value in the array np.argmax
returns the index zero (leading to wrong results). This is why I used np.select
to check and correct for these cases.
Is it necessary to define the three dimensional matrices x_3d
and lim_3d
, or there is a simpler way to find the 2D matrix of the indices index_mat
?
Suggestions, especially to improve the way I expanded the dimension of the arrays, are welcome.
Upvotes: 0
Views: 75
Reputation: 1325
I think you can solve this using two tricks. First, a 2d array can be easily flattened to a 1d array, and then your answers can be converted back into a 2d array with reshape.
Next, your use of argmax
suggests that your array is sorted. Then you can find your full set of indices using digitize. Thus instead of a single index, you will get a complete array of indices. All the calculations you are doing are intrinsically supported as array operations in numpy, so that should not cause any problems.
You will have to specifically look at the limiting cases. If those are rare enough, then it might be okay to let the answers be derived by the default formula (they will be garbage values), and then replace them with the actual values you desire.
Upvotes: 1