Reputation: 33

Determining if a column value is between a conditional range based on another column

I have a dataframe that looks as follows:

    data = np.array([[5, 'red', 2,6, 8, 10],
                 [11, 'red', 3,9,6,15],
                 [8, 'blue', 0, 3, 5, 10],
                 [2, 'blue', 1, 2, 3, 4]])
    df = pd.DataFrame(data, columns = ['A','B','red_lower', 'red_upper', 'blue_lower', 'blue_upper'])

    A     B red_lower red_upper blue_lower blue_upper
0   5   red         2         6          8         10
1  11   red         3         9          6         15
2   8  blue         0         3          5         10
3   2  blue         1         2          3          4

I'd like to create an additional column that tells me if the value in a column A is in the range of the color given in column B. For example, in row 0, since 5 has the designation red, I will check if 5 is between 2 and 6. It is, so I will have the new column have a 1.

Desired result:

    A    B   red_lower red_upper blue_lower blue_upper in_range
0   5   red         2         6          8         10        1
1  11   red         3         9          6         15        0
2   8  blue         0         3          5         10        1
3   2  blue         1         2          3          4        0

I've tried to write a loop, but I'm getting many series errors. I really dont want to have to split up the dataframe (by color), but maybe that's the way to go? (in my actual dataframe, there are six different 'colors', not just two).

Thank you!

EDIT: bonus if we have the additional column tell me if the value is above or below the range! For example, in row 1, 11 is outside the range, so is too high. Table should look this way:

    A     B red_lower red_upper blue_lower blue_upper in_range
0   5   red         2         6          8         10   inside
1  11   red         3         9          6         15    above
2   8  blue         0         3          5         10   inside
3   2  blue         1         2          3          4    below

Upvotes: 3

Answers (2)

user3483203

Reputation: 51185

`justify` + `broadcast` + `mask` + `logical_and`

You can use some nifty broadcasting here, and the function justify from another answer. This assumes that each color has a single valid range. It also assumes that all of your numeric columns are in fact numeric.

values = df.A.values
colors = df.B.values

range_frame = df.iloc[:, 2:]
ranges = range_frame.columns.str.split('_').str[0].values

m = colors != ranges[:, None]
masked = range_frame.mask(m)

jf = justify(masked.values, invalid_val=np.nan)[:, :2]
ir = np.logical_and(jf[:, 0] < values, values < jf[:, 1]).astype(int)

c1 = values <= jf[:, 0]
c2 = values >= jf[:, 1]

irl = np.select([c1, c2], ['below', 'above'], 'inside')

df.assign(in_range=ir, in_range_flag=irl)

    A     B  red_lower  red_upper  blue_lower  blue_upper  in_range in_range_flag
0   5   red          2          6           8          10         1        inside
1  11   red          3          9           6          15         0         above
2   8  blue          0          3           5          10         1        inside
3   3  blue          1          2           3           4         0         below

`stack` + `reshape` + `logical_and`

Again making the same assumptions as the first answer.

u = df.set_index(['A', 'B']).stack().rename_axis(['A', 'B', 'flag']).reset_index()
frame = u[u.flag.str.split('_').str[0] == u.B]

values = frame[::2].A.values
ranges = frame[0].values.reshape(-1, 2)

ir = np.logical_and(ranges[:, 0] < values, values < ranges[:, 1])

c1 = values <= ranges[:, 0]
c2 = values >= ranges[:, 1]

irl = np.select([c1, c2], ['below', 'above'], 'inside')

df.assign(in_range=ir, in_range_flag=irl)

Here is the definition for the justify function by @Divakar:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

Upvotes: 3

BENY

Reputation: 323386

Here is using groupby split the df and most of step handled by the definition , which means you do not need input the different color each time

l=[]
for name,x  in df.groupby('B',sort=False):
    s1=(x.A >= x.filter(like=name).iloc[:, 0]) & (x.A <= x.filter(like=name).iloc[:, 1])
    s2=x.A<x.filter(like=name).iloc[:, 0]
    l.extend(np.select([s1,s2],['inside','below'],default='above').tolist())

df['in_range']=l
df
Out[64]: 
    A     B  red_lower  red_upper  blue_lower  blue_upper in_range
0   5   red          2          6           8          10   inside
1  11   red          3          9           6          15    above
2   8  blue          0          3           5          10   inside
3   2  blue          1          2           3           4    below

Upvotes: 3

Determining if a column value is between a conditional range based on another column

Answers (2)

justify + broadcast + mask + logical_and

stack + reshape + logical_and

Related Questions

`justify` + `broadcast` + `mask` + `logical_and`

`stack` + `reshape` + `logical_and`