Slicing dataframes with varying conditions

Question

Let's suppose we have this reproducible dataframe:

import pandas as pd
import numpy as np

#Dataframe:
df = pd.DataFrame({
    'age': [20,15,34,40,60],
    'rank': [2, 7, 6, 5, 1],
    'height':[155, 159, 180 , 163, 170],
    'weight':[53, 60, 75, 80, 49],
                  })

I'm trying to slice this dataframe based on pre-defined conditions, where each condition is a list like this [[threshold_value,name_of_column], [threshold_value2 ,name_of_column_2], ...]. These are the following conditions:

#Conditions:
condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
condition_2 = [[25,'age'], [60, 'weight'] ,[5,'rank']]
condition_3 = [[50,'age'], [3, 'rank']]

Is there a way to slice the dataframe with each combination of conditions correspondently ? (using pandas.query() or any slicing method)

I am able to do so but manually for each condition, something like this:

#Expected output:
output_df_1 = df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
output_df_2 = df[ (df['age'] < 25) & (df['weight'] < 60) & (df['rank'] < 5)]
output_df_3 = df[ (df['age'] < 50) & (df['rank'] < 3)]

Any kind of help will be much appreciated, thank you !

hpaulj · Accepted Answer

In [30]: import pandas as pd
In [31]: df = pd.DataFrame({
    ...:     'age': [20,15,34,40,60],
    ...:     'rank': [2, 7, 6, 5, 1],
    ...:     'height':[155, 159, 180 , 163, 170],
    ...:     'weight':[53, 60, 75, 80, 49],
    ...:                   })
In [32]: condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
In [33]: df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
Out[33]: 
   age  rank  height  weight
1   15     7     159      60
In [34]: (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)
Out[34]: 
0    False
1     True
2    False
3    False
4    False
dtype: bool

We could use a list comprehension to apply the tests, and logical_and.reduce to combine them:

In [37]: np.logical_and.reduce([df[col]


Or we could apply the tests to an array version of the frame.  This is an example of numpy vectorization.
rearrange condition_1:
In [39]: values, cols = zip(*condition_1)
In [40]: values
Out[40]: (16, 8, 163)
In [41]: cols
Out[41]: ('age', 'rank', 'height')

The desired columns as array:
In [43]: df[list(cols)]
Out[43]: 
   age  rank  height
0   20     2     155
1   15     7     159
2   34     6     180
3   40     5     163
4   60     1     170
In [44]: df[list(cols)].values
Out[44]: 
array([[ 20,   2, 155],
       [ 15,   7, 159],
       [ 34,   6, 180],
       [ 40,   5, 163],
       [ 60,   1, 170]])

apply the test:
In [45]: df[list(cols)].values

(bad choice of values variable in [39]).  This applies < to a (n,3) and (3,), which works by broadcasting.
and find the row that is True for all values - same as [37]
In [51]: (df[list(cols)].values

Slicing dataframes with varying conditions

Answers (2)

Related Questions