How to create more efficient boolean logic code that compares multiple rows of one column to another?

Question

I want an efficient and general approach to creating a new Boolean column in a data-frame based on comparison of multiple rows of a column to multiple rows of other column(s). By multple rows, I mean the current row of a column and one or more lags of that column (i.e. df['column'].shift(1). Right now I create a logical comparison for every column in my dataframe. In the example below, I only have 3 columns. I am hoping there is a way to use any() or all() to create more efficient logical comparison for the case when I might have 50 columns to check.

I have already successfully created the column by writing detailed conditions for each column and placing the criteria in a np.where() statement. This approach will not work if I have to check 50 columns.

This code produces the original dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame({'C1':[3,2,8,6,6,7,8],'C2':[5,4,6,4,8,3,2],'C3': [5,4,6,7,8,6,4],
                   'Year':[2012,2013,2014,2015,2016,2017,2018]})
df.set_index('Year', inplace=True)
df1 =df[::-1]
df1

For the example I am giving, I seek to create the column 'Success' which is 1 if the cany column is 8 and the column below it (previous year) is less than 8. Here is what I have done so far:

Criteria_1 = ( (df1['C1']==8) & (df1['C1'] > df1['C1'].shift(-1) ) |
               (df1['C2']==8) & (df1['C2'] > df1['C2'].shift(-1) )|
               (df1['C3']==8) & (df1['C3'] > df1['C3'].shift(-1) )
              )
df1['Sucess']=np.where(Criteria_1,1,0)

I would like to see the same output as produced by the code above but with more terse code using maybe any() so that I can say if any column meets the above condition success should equal 1.

anky · Accepted Answer

May be something like:

df1 =df[::-1].copy()
df1['Success']=(df1.eq(8)&df1.gt(df1.shift(-1))).any(axis=1).astype(int)
print(df1)

      C1  C2  C3  Success
Year                     
2018   8   2   4        1
2017   7   3   6        0
2016   6   8   8        1
2015   6   4   7        0
2014   8   6   6        1
2013   2   4   4        0
2012   3   5   5        0

How to create more efficient boolean logic code that compares multiple rows of one column to another?

Answers (1)

Related Questions