Finding the argmax() of a column based on constraints in other columns of an Numpy array

Question

My question is quite straightforward and there is probably a really simple way to solve this which I couldn't find out. So, firstly, I concatenate some arrays, and then I want to find the combination of the first and second column (data_x1, data_x2) that returns me the maximum value of y. However, there is one constraint, I want to limit all the x between -20 and 20, if it is more than 20 or less than -20, I want to ignore this value.

Also I am using this process inside a function, hence I am really looking for a way which may work for a n-number of 'x'. Summarizing: I want to find out the optimal y for the constrained data_x1 and data_x2, that means, the optimal value in the row data_y which correspond to the value of the data_x1 and data_x2 that are bounded by the aforementioned condition ( < 20 and > -20). For example, in this dataset that I am providing, the row with contains the maximum data_y is beyond the conditions that I am imposing. Example, when I try:

y_max = data_y.max()   
ID = data_y.argmax()
x1_max = data_x1[ID]
x2_max = data_x2[ID]

I will have x2_2 beyond the limit that I want to impose.

Here is the dataset:

data_x1 = np.array([ 7.50581267e-01,  4.85312598e+00, -1.37035821e+00, -1.27199171e-03,
       -1.61347902e+00, -2.47705419e+00,  1.54149227e-01,  2.96462913e+00,
        6.39336584e+00,  2.22526551e+00, -3.13825557e+00, -4.53521105e+00,
        3.66632759e+00,  6.95980810e-01, -2.08555389e+00, -3.42268057e+00,
       -2.67733126e+00,  3.44611056e+00, -3.21242281e-01, -4.45557410e+00,
        2.36357280e+00,  6.76143624e-01, -1.12756068e+00,  1.56898158e+00,
       -2.73721604e+00,  2.63754963e+00, -4.52874687e+00, -2.96449234e+00,
       -4.38481329e+00, -1.50384134e+00, -2.52651726e+00, -1.34210192e+00,
       -2.39860669e-01,  1.40859346e+00,  1.85432054e-01,  5.01414945e-01,
        4.55880766e+00, -1.05363585e+00, -4.62917198e+00,  2.59998127e+00,
        5.25344447e+00,  3.07701918e-01,  2.26443850e+00, -2.22101423e+00,
        3.02861897e-01,  1.65691179e+00,  8.81562566e-01, -1.87325712e+00,
        4.63772521e+00,  2.64284088e-01,  2.53643045e+00,  9.63172795e-01,
        2.36685850e+00,  2.54559573e+00, -9.02629613e-01,  2.24687227e+00,
        6.22720302e+00,  5.74281188e+00,  2.03796010e+00,  4.80760151e+00])


data_x2 = np.array([-30.09938636, -28.83362992, -22.57425202, -23.14358566,
       -33.59852454, -27.51674098, -30.7885103 , -25.90249062,
       -22.08337401, -29.07237476, -23.04023689, -30.30583811,
       -21.00309374, -29.99686696, -28.90991919, -26.62903318,
       -31.72168863, -22.87107873, -30.729956  , -25.6780506 ,
       -31.38729541, -27.19055645, -27.55148381, -28.68462801,
       -26.05224771, -30.87040206, -22.95430799, -26.91256322,
       -35.8942374 , -21.50322056, -26.16176442, -22.85920962,
       -28.05071496, -34.30775127, -28.7790589 , -31.19811517,
       -27.63535267, -28.96808588, -26.89286845, -32.81312953,
       -27.35855807, -28.89865079, -25.61937868, -32.59681293,
       -28.79511822, -22.54470727, -31.06309398, -25.30574423,
       -23.52838694, -27.55017459, -24.55437336, -24.39558638,
       -22.81063876, -28.62340189, -27.85680254, -25.10753673,
       -29.75683744, -27.37575317, -29.61561727, -34.50702866]

data_y = np.array([2511661.54014723, 2506471.03096404, 2496512.87703406,
       2500666.09145807, 2492786.42701569, 2513191.79101637,
       2509515.1829362 , 2509970.89367091, 2481463.90896938,
       2512505.17266542, 2496999.56860772, 2503950.65803291,
       2481665.31885133, 2511985.61283778, 2512968.70827174,
       2510599.791468  , 2502795.50006905, 2495342.7106848 ,
       2509708.93248061, 2505715.61726413, 2504986.68522465,
       2514933.54167635, 2514835.36052355, 2513916.01349115,
       2510784.07070835, 2506718.40944214, 2493199.57962053,
       2511925.51820147, 2466117.27254433, 2488828.88557003,
       2511417.16267116, 2498364.67720219, 2515221.17931068,
       2487471.40157182, 2514636.01655828, 2507757.43933369,
       2508292.40113149, 2514000.76143246, 2507722.80700035,
       2496671.63747914, 2505965.77313117, 2514453.85665244,
       2510375.19913626, 2498705.33749204, 2514595.64115671,
       2496054.0775116 , 2508144.96504256, 2509901.46588431,
       2496183.49020786, 2515239.10310988, 2506016.58240813,
       2507055.51518852, 2496891.65309883, 2512606.04865712,
       2515010.58385846, 2508707.73815183, 2499240.78218084,
       2504177.72406016, 2511686.21461949, 2477825.15797829])

Hope that I managed to be succinct and precise albeit the length of the explanation. I would really appreciate your help on this one!

Damian Vu · Accepted Answer

Your data_x2 contains no values between -20 and 20.

If you can use pandas for this, you can do (example is for -30 < x < 30)

import pandas as pd
df = pd.DataFrame({'x1': data_x1, 'x2': data_x2, 'y': data_y})
df = df[df['x1'].between(-30, 30, inclusive=False) & df['x2'].between(-30, 30, inclusive=False)]

df.sort_values(by='y', ascending=False).iloc[0]
Output: 
x1    2.642841e-01
x2   -2.755017e+01
y     2.515239e+06
Name: 49, dtype: float64

Here's a function for calculating this. (Again using pandas)

def func(x1, x2, y, lower_bound, upper_bound):
    df = pd.DataFrame({'x1': x1, 'x2': x2, 'y': y})
    df = df[df['x1'].between(lower_bound, upper_bound, inclusive=False) & df['x2'].between(lower_bound, upper_bound, inclusive=False)]
    df.sort_values(by='y', ascending=False, inplace=True)
    if len(df):
        return df['x1'].iloc[0], df['x2'].iloc[0]

func(data_x1, data_x2, data_y, -20, 20)
Output:
None

func(data_x1, data_x2, data_y, -30, 30)
Output:
(0.264284088, -27.55017459)

EDIT:

Using pandas DataFrame is nice because it treats your data as a matrix where you can slice based on values in multiple columns. The numpy solution below works, but requires replacing values that are outside of your range with np.nan in order to keep your indexes the same.

Here's a pure numpy solution with help from Removing nan in array at position from another numpy array

data_x1 = np.where(np.logical_and(data_x1 > -30, data_x1 < 30), data_x1, np.nan)
data_x2 = np.where(np.logical_and(data_x2 > -30, data_x2 < 30), data_x2, np.nan)
mask = ~np.isnan(data_x1) & ~np.isnan(data_x2)
data_y = np.where(mask, data_y, np.nan)
idx = np.nanargmax(data_y)

data_x1[idx], data_x2[idx]
Output:
(0.264284088, -27.55017459)

Although, I would agree with Evgeny and use Pandas DataFrame's as it is easier to follow IMO

Finding the argmax() of a column based on constraints in other columns of an Numpy array

Answers (2)

Related Questions