Reputation: 231
I have two DataFrames shown below. The DataFrames in reality are larger than the sample below.
df1
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 max min location
0 0010 20 22 21 23 26 26 20 NY
1 0011 30 25 23 31 33 33 23 CA
2 0012 67 68 68 69 65 69 67 GA
3 0013 34 33 31 30 35 35 31 MO
4 0014 44 42 40 39 50 50 39 WA
df2
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 location
0 0020 19 27 21 24 20 NY
1 0021 31 22 23 30 33 CA
2 0023 66 67 68 70 65 GA
3 0022 34 33 31 30 35 MO
4 0025 41 42 40 39 50 WA
5 0030 19 26 20 24 20 NY
6 0032 37 31 31 20 35 MO
7 0034 40 41 39 39 50 WA
The idea is to compare each row of df2 against the appropriate max and min value specified in df1. The threshold value to be compared depends on the match in the location column. If any of the row values are outside the range defined by min and max value, they will be put in a separate dataframe. Please note the number of cost segments are vary.
Upvotes: 2
Views: 236
Reputation: 71689
# Merge the dataframes on location to append the min/max columns to df2
df3 = df2.merge(df1[['location', 'max', 'min']], on='location', how='left')
# select the cost like columns
cost = df3.filter(like='cost')
# Check whether the cost values satisfy the interval condition
mask = cost.ge(df3['min'], axis=0) & cost.le(df3['max'], axis=0)
# filter the rows where one or more values in row do not satisfy the condition
df4 = df2[~mask.all(axis=1)]
print(df4)
route_no cost_h1 cost_h2 cost_h3 cost_h4 cost_h5 location
0 0020 19 27 21 24 20 NY
1 0021 31 22 23 30 33 CA
2 0023 66 67 68 70 65 GA
3 0022 34 33 31 30 35 MO
5 0030 19 26 20 24 20 NY
6 0032 37 31 31 20 35 MO
Upvotes: 2