Reputation: 235
I want an efficient and general approach to creating a new Boolean column in a data-frame based on comparison of multiple rows of a column to multiple rows of other column(s). By multple rows, I mean the current row of a column and one or more lags of that column (i.e. df['column'].shift(1). Right now I create a logical comparison for every column in my dataframe. In the example below, I only have 3 columns. I am hoping there is a way to use any() or all() to create more efficient logical comparison for the case when I might have 50 columns to check.
I have already successfully created the column by writing detailed conditions for each column and placing the criteria in a np.where() statement. This approach will not work if I have to check 50 columns.
This code produces the original dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame({'C1':[3,2,8,6,6,7,8],'C2':[5,4,6,4,8,3,2],'C3': [5,4,6,7,8,6,4],
'Year':[2012,2013,2014,2015,2016,2017,2018]})
df.set_index('Year', inplace=True)
df1 =df[::-1]
df1
For the example I am giving, I seek to create the column 'Success' which is 1 if the cany column is 8 and the column below it (previous year) is less than 8. Here is what I have done so far:
Criteria_1 = ( (df1['C1']==8) & (df1['C1'] > df1['C1'].shift(-1) ) |
(df1['C2']==8) & (df1['C2'] > df1['C2'].shift(-1) )|
(df1['C3']==8) & (df1['C3'] > df1['C3'].shift(-1) )
)
df1['Sucess']=np.where(Criteria_1,1,0)
I would like to see the same output as produced by the code above but with more terse code using maybe any() so that I can say if any column meets the above condition success should equal 1.
Upvotes: 2
Views: 141
Reputation: 75120
May be something like:
df1 =df[::-1].copy()
df1['Success']=(df1.eq(8)&df1.gt(df1.shift(-1))).any(axis=1).astype(int)
print(df1)
C1 C2 C3 Success
Year
2018 8 2 4 1
2017 7 3 6 0
2016 6 8 8 1
2015 6 4 7 0
2014 8 6 6 1
2013 2 4 4 0
2012 3 5 5 0
Upvotes: 4