Keyser Soze
Keyser Soze

Reputation: 272

Slicing dataframes with varying conditions

Let's suppose we have this reproducible dataframe:

import pandas as pd
import numpy as np

#Dataframe:
df = pd.DataFrame({
    'age': [20,15,34,40,60],
    'rank': [2, 7, 6, 5, 1],
    'height':[155, 159, 180 , 163, 170],
    'weight':[53, 60, 75, 80, 49],
                  })

I'm trying to slice this dataframe based on pre-defined conditions, where each condition is a list like this [[threshold_value,name_of_column], [threshold_value2 ,name_of_column_2], ...]. These are the following conditions:

#Conditions:
condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
condition_2 = [[25,'age'], [60, 'weight'] ,[5,'rank']]
condition_3 = [[50,'age'], [3, 'rank']]

Is there a way to slice the dataframe with each combination of conditions correspondently ? (using pandas.query() or any slicing method)

I am able to do so but manually for each condition, something like this:

#Expected output:
output_df_1 = df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
output_df_2 = df[ (df['age'] < 25) & (df['weight'] < 60) & (df['rank'] < 5)]
output_df_3 = df[ (df['age'] < 50) & (df['rank'] < 3)]

Any kind of help will be much appreciated, thank you !

Upvotes: 1

Views: 72

Answers (2)

Tim Roberts
Tim Roberts

Reputation: 54718

You can do all of those &s separately:

pick = np.ones[df.shape[0]]
for value,key in condition_1:
    pick = pick & (df[key] < value)

You then use that column of booleans to extract the rows you want, just as you did in your example.

output = df[pick]

Upvotes: 2

hpaulj
hpaulj

Reputation: 231385

In [30]: import pandas as pd
In [31]: df = pd.DataFrame({
    ...:     'age': [20,15,34,40,60],
    ...:     'rank': [2, 7, 6, 5, 1],
    ...:     'height':[155, 159, 180 , 163, 170],
    ...:     'weight':[53, 60, 75, 80, 49],
    ...:                   })
In [32]: condition_1 = [[16,'age'],[8,'rank'], [163, 'height']]
In [33]: df[ (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)]
Out[33]: 
   age  rank  height  weight
1   15     7     159      60
In [34]: (df['age'] < 16) & (df['rank'] < 8) & (df['height'] < 163)
Out[34]: 
0    False
1     True
2    False
3    False
4    False
dtype: bool

We could use a list comprehension to apply the tests, and logical_and.reduce to combine them:

In [37]: np.logical_and.reduce([df[col]<val for val,col in condition_1])
Out[37]: array([False,  True, False, False, False])
In [38]: df[np.logical_and.reduce([df[col]<val for val,col in condition_1])]
Out[38]: 
   age  rank  height  weight
1   15     7     159      60

Or we could apply the tests to an array version of the frame. This is an example of numpy vectorization.

rearrange condition_1:

In [39]: values, cols = zip(*condition_1)
In [40]: values
Out[40]: (16, 8, 163)
In [41]: cols
Out[41]: ('age', 'rank', 'height')

The desired columns as array:

In [43]: df[list(cols)]
Out[43]: 
   age  rank  height
0   20     2     155
1   15     7     159
2   34     6     180
3   40     5     163
4   60     1     170
In [44]: df[list(cols)].values
Out[44]: 
array([[ 20,   2, 155],
       [ 15,   7, 159],
       [ 34,   6, 180],
       [ 40,   5, 163],
       [ 60,   1, 170]])

apply the test:

In [45]: df[list(cols)].values<values
Out[45]: 
array([[False,  True,  True],
       [ True,  True,  True],
       [False,  True, False],
       [False,  True, False],
       [False,  True, False]])

(bad choice of values variable in [39]). This applies < to a (n,3) and (3,), which works by broadcasting.

and find the row that is True for all values - same as [37]

In [51]: (df[list(cols)].values<values).all(axis=1)
Out[51]: array([False,  True, False, False, False])

Upvotes: 2

Related Questions