Reputation: 272
Let's suppose we have this reproducible dataframe:
import pandas as pd
import numpy as np
#Dataframe:
df = pd.DataFrame({
'age': [20,15,34,40,60],
'rank': [2, 7, 6, 5, 1],
'height':[155, 159, 180 , 163, 170],
'weight':[53, 60, 75, 80, 49],
})
I'm trying to slice this dataframe based on pre-defined conditions, where each condition is a list like this [[threshold_value,name_of_column], [threshold_value2 ,name_of_column_2], ...]. These are the following conditions:
#Conditions:
condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
condition_2 = [[25,'age'], [60, 'weight'] ,[5,'rank']]
condition_3 = [[50,'age'], [3, 'rank']]
Is there a way to slice the dataframe with each combination of conditions correspondently ? (using pandas.query() or any slicing method)
I am able to do so but manually for each condition, something like this:
#Expected output:
output_df_1 = df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
output_df_2 = df[ (df['age'] < 25) & (df['weight'] < 60) & (df['rank'] < 5)]
output_df_3 = df[ (df['age'] < 50) & (df['rank'] < 3)]
Any kind of help will be much appreciated, thank you !
Upvotes: 1
Views: 72
Reputation: 54718
You can do all of those &s separately:
pick = np.ones[df.shape[0]]
for value,key in condition_1:
pick = pick & (df[key] < value)
You then use that column of booleans to extract the rows you want, just as you did in your example.
output = df[pick]
Upvotes: 2
Reputation: 231385
In [30]: import pandas as pd
In [31]: df = pd.DataFrame({
...: 'age': [20,15,34,40,60],
...: 'rank': [2, 7, 6, 5, 1],
...: 'height':[155, 159, 180 , 163, 170],
...: 'weight':[53, 60, 75, 80, 49],
...: })
In [32]: condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
In [33]: df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
Out[33]:
age rank height weight
1 15 7 159 60
In [34]: (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)
Out[34]:
0 False
1 True
2 False
3 False
4 False
dtype: bool
We could use a list comprehension to apply the tests, and logical_and.reduce
to combine them:
In [37]: np.logical_and.reduce([df[col]<val for val,col in condition_1])
Out[37]: array([False, True, False, False, False])
In [38]: df[np.logical_and.reduce([df[col]<val for val,col in condition_1])]
Out[38]:
age rank height weight
1 15 7 159 60
Or we could apply the tests to an array version of the frame. This is an example of numpy vectorization
.
rearrange condition_1
:
In [39]: values, cols = zip(*condition_1)
In [40]: values
Out[40]: (16, 8, 163)
In [41]: cols
Out[41]: ('age', 'rank', 'height')
The desired columns as array:
In [43]: df[list(cols)]
Out[43]:
age rank height
0 20 2 155
1 15 7 159
2 34 6 180
3 40 5 163
4 60 1 170
In [44]: df[list(cols)].values
Out[44]:
array([[ 20, 2, 155],
[ 15, 7, 159],
[ 34, 6, 180],
[ 40, 5, 163],
[ 60, 1, 170]])
apply the test:
In [45]: df[list(cols)].values<values
Out[45]:
array([[False, True, True],
[ True, True, True],
[False, True, False],
[False, True, False],
[False, True, False]])
(bad choice of values
variable in [39]). This applies <
to a (n,3) and (3,), which works by broadcasting
.
and find the row
that is True for all values - same as [37]
In [51]: (df[list(cols)].values<values).all(axis=1)
Out[51]: array([False, True, False, False, False])
Upvotes: 2