Reputation: 3973
I would like to filter my DataFrame
by evaluating some conditions against several columns of the DataFrame
. I illustrate what I want to do with the following eample:
df = {'user': [1,1,1,2,2,2],
'speed':[10,20,90,15,39, 10],
'acceleration': [9.8,29,5,4,7, 3],
'jerk':[50,60,60,40,20,-50],
'mode':['car','car','car','metro','metro', 'metro']}
df = pd.DataFrame.from_dict(df)
df
user speed acceleration jerk mode
0 1 10 9.8 50 car
1 1 20 29.0 60 car
2 1 90 5.0 60 car
3 2 15 4.0 40 metro
4 2 39 7.0 20 metro
5 2 10 3.0 -50 metro
In the given example, I would like to filter the dataframe based on thresholds set against speed, acceleration
and jerk
columns as in the table below:
+-------+-------+--------------+------+-----+
| | speed | acceleration | jerk |
+-------+-------+--------------+------+-----+
| | max | max | min | max |
| --- | --- | --- | --- | --- |
| car | 50 | 10 | -100 | 100 |
| metro | 35 | 5 | 60 | -40 |
+-------+-------+--------------+------+-----+
So only users' with speed
& acceleration
below the max
as well as user's jerk
within min-max
are selected (or delete rows not satisfying stated conditions).
Upvotes: 2
Views: 48
Reputation: 6483
You can use reindex
, and then do the msk:
threshold=threshold.reindex(df['mode'])
threshold=threshold.reset_index(drop=True)
msk=(df.acceleration.lt(threshold['acceleration','max']))&\
(df.speed.lt(threshold['speed','max']))&\
(df.jerk.ge(threshold['jerk','min'])&\
df.jerk.le(threshold['jerk','max']))
df[msk]
Details
Taking this threshold dataframe:
threshold=pd.DataFrame({'s':['car','car','metro','metro'],
'acceleration':[10,5,5,2],
'speed':[50,5,35,2],
'jerk':[-100,100,60,-40]})
threshold=threshold.groupby('s').agg({'acceleration':'max',
'speed':'max',
'jerk':['min','max']})
threshold
# acceleration speed jerk
# max max min max
#s
#car 10 50 -100 100
#metro 5 35 -40 60
You can use 'mode'
column to make the reindex
:
threshold=threshold.reindex(df['mode'])
# acceleration speed jerk
# max max min max
#mode
#car 10 50 -100 100
#car 10 50 -100 100
#car 10 50 -100 100
#metro 5 35 -40 60
#metro 5 35 -40 60
#metro 5 35 -40 60
threshold=threshold.reset_index(drop=True)
msk=(df.acceleration.lt(threshold['acceleration','max']))&\
(df.speed.lt(threshold['speed','max']))&\
(df.jerk.ge(threshold['jerk','min'])&\
df.jerk.le(threshold['jerk','max']))
df[msk]
# user speed acceleration jerk mode
#0 1 10 9.8 50 car
#3 2 15 4.0 40 metro
Upvotes: 2