Reputation: 33
I have a dataframe that looks as follows:
data = np.array([[5, 'red', 2,6, 8, 10],
[11, 'red', 3,9,6,15],
[8, 'blue', 0, 3, 5, 10],
[2, 'blue', 1, 2, 3, 4]])
df = pd.DataFrame(data, columns = ['A','B','red_lower', 'red_upper', 'blue_lower', 'blue_upper'])
A B red_lower red_upper blue_lower blue_upper
0 5 red 2 6 8 10
1 11 red 3 9 6 15
2 8 blue 0 3 5 10
3 2 blue 1 2 3 4
I'd like to create an additional column that tells me if the value in a column A is in the range of the color given in column B. For example, in row 0, since 5 has the designation red, I will check if 5 is between 2 and 6. It is, so I will have the new column have a 1.
Desired result:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 1
1 11 red 3 9 6 15 0
2 8 blue 0 3 5 10 1
3 2 blue 1 2 3 4 0
I've tried to write a loop, but I'm getting many series errors. I really dont want to have to split up the dataframe (by color), but maybe that's the way to go? (in my actual dataframe, there are six different 'colors', not just two).
Thank you!
EDIT: bonus if we have the additional column tell me if the value is above or below the range! For example, in row 1, 11 is outside the range, so is too high. Table should look this way:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 inside
1 11 red 3 9 6 15 above
2 8 blue 0 3 5 10 inside
3 2 blue 1 2 3 4 below
Upvotes: 3
Views: 283
Reputation: 51165
justify
+ broadcast
+ mask
+ logical_and
You can use some nifty broadcasting here, and the function justify
from another answer. This assumes that each color has a single valid range. It also assumes that all of your numeric columns are in fact numeric.
values = df.A.values
colors = df.B.values
range_frame = df.iloc[:, 2:]
ranges = range_frame.columns.str.split('_').str[0].values
m = colors != ranges[:, None]
masked = range_frame.mask(m)
jf = justify(masked.values, invalid_val=np.nan)[:, :2]
ir = np.logical_and(jf[:, 0] < values, values < jf[:, 1]).astype(int)
c1 = values <= jf[:, 0]
c2 = values >= jf[:, 1]
irl = np.select([c1, c2], ['below', 'above'], 'inside')
df.assign(in_range=ir, in_range_flag=irl)
A B red_lower red_upper blue_lower blue_upper in_range in_range_flag
0 5 red 2 6 8 10 1 inside
1 11 red 3 9 6 15 0 above
2 8 blue 0 3 5 10 1 inside
3 3 blue 1 2 3 4 0 below
stack
+ reshape
+ logical_and
Again making the same assumptions as the first answer.
u = df.set_index(['A', 'B']).stack().rename_axis(['A', 'B', 'flag']).reset_index()
frame = u[u.flag.str.split('_').str[0] == u.B]
values = frame[::2].A.values
ranges = frame[0].values.reshape(-1, 2)
ir = np.logical_and(ranges[:, 0] < values, values < ranges[:, 1])
c1 = values <= ranges[:, 0]
c2 = values >= ranges[:, 1]
irl = np.select([c1, c2], ['below', 'above'], 'inside')
df.assign(in_range=ir, in_range_flag=irl)
Here is the definition for the justify
function by @Divakar:
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = ~np.isnan(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
Upvotes: 3
Reputation: 323226
Here is using groupby
split the df and most of step handled by the definition , which means you do not need input the different color each time
l=[]
for name,x in df.groupby('B',sort=False):
s1=(x.A >= x.filter(like=name).iloc[:, 0]) & (x.A <= x.filter(like=name).iloc[:, 1])
s2=x.A<x.filter(like=name).iloc[:, 0]
l.extend(np.select([s1,s2],['inside','below'],default='above').tolist())
df['in_range']=l
df
Out[64]:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 inside
1 11 red 3 9 6 15 above
2 8 blue 0 3 5 10 inside
3 2 blue 1 2 3 4 below
Upvotes: 3