arilwan
arilwan

Reputation: 3973

Delete the rows of a DataFrame satisfying conditions evaluated against multiple columns

I would like to filter my DataFrame by evaluating some conditions against several columns of the DataFrame. I illustrate what I want to do with the following eample:

df = {'user': [1,1,1,2,2,2],
      'speed':[10,20,90,15,39, 10],
      'acceleration': [9.8,29,5,4,7, 3],
      'jerk':[50,60,60,40,20,-50],
      'mode':['car','car','car','metro','metro', 'metro']}


df = pd.DataFrame.from_dict(df)
df
    user  speed   acceleration  jerk  mode
0     1     10           9.8    50    car
1     1     20          29.0    60    car
2     1     90           5.0    60    car
3     2     15           4.0    40  metro
4     2     39           7.0    20  metro
5     2     10           3.0   -50  metro

In the given example, I would like to filter the dataframe based on thresholds set against speed, acceleration and jerk columns as in the table below:

+-------+-------+--------------+------+-----+
|       | speed | acceleration |    jerk    |
+-------+-------+--------------+------+-----+
|       | max   |    max       | min  | max |
| ---   | ---   |    ---       | ---  | --- |
| car   | 50    |    10        | -100 | 100 |
| metro | 35    |    5         | 60   | -40 |
+-------+-------+--------------+------+-----+

So only users' with speed & acceleration below the max as well as user's jerk within min-max are selected (or delete rows not satisfying stated conditions).

Upvotes: 2

Views: 48

Answers (2)

MrNobody33
MrNobody33

Reputation: 6483

You can use reindex, and then do the msk:

threshold=threshold.reindex(df['mode'])

threshold=threshold.reset_index(drop=True)

msk=(df.acceleration.lt(threshold['acceleration','max']))&\
    (df.speed.lt(threshold['speed','max']))&\
    (df.jerk.ge(threshold['jerk','min'])&\
     df.jerk.le(threshold['jerk','max']))
df[msk]

Details

Taking this threshold dataframe:

threshold=pd.DataFrame({'s':['car','car','metro','metro'],
                        'acceleration':[10,5,5,2],
                       'speed':[50,5,35,2],
                       'jerk':[-100,100,60,-40]})
threshold=threshold.groupby('s').agg({'acceleration':'max',
                                 'speed':'max',
                                 'jerk':['min','max']})

threshold
#      acceleration speed jerk     
#               max   max  min  max
#s                                 
#car             10    50 -100  100
#metro            5    35  -40   60

You can use 'mode' column to make the reindex:

threshold=threshold.reindex(df['mode'])
#      acceleration speed jerk     
#               max   max  min  max
#mode                              
#car             10    50 -100  100
#car             10    50 -100  100
#car             10    50 -100  100
#metro            5    35  -40   60
#metro            5    35  -40   60
#metro            5    35  -40   60

threshold=threshold.reset_index(drop=True)

msk=(df.acceleration.lt(threshold['acceleration','max']))&\
    (df.speed.lt(threshold['speed','max']))&\
    (df.jerk.ge(threshold['jerk','min'])&\
     df.jerk.le(threshold['jerk','max']))

df[msk]
#   user  speed  acceleration  jerk   mode
#0     1     10           9.8    50    car
#3     2     15           4.0    40  metro

Upvotes: 2

Johnny_Mali39
Johnny_Mali39

Reputation: 40

maybe the where clause is what you're looking for.

Upvotes: 0

Related Questions