Reputation: 47
Here is my input dataframe:
df = pd.DataFrame({'Company':['A','B','C','D','E','F'],'Industry':['Metals','Metals','IT','IT','IT','banking'],'ROE':[10,9,5,14,1,9],'ROCE':[10,5,5,1,10,9],'Threshold':[1,1,2,2,2,1]});df
Need output as follows:
dfout = pd.DataFrame({'Company':['A','D','E','F'],'Industry':['Metals','IT','IT','banking'],'ROE':[10,14,1,9],'ROCE':[10,1,10,9],'Threshold':[1,2,2,1]});dfout
Logic: To fetch rows with top 'n' ROE and ROCE per industry. N is 'Threshold' column in the dataframe. Appreciate your inputs on this. Thank you.
Upvotes: 0
Views: 156
Reputation: 402283
First, sort your data by ROE/ROCE:
df = df.iloc[(-np.maximum(df.ROCE, df.ROE)).argsort()]
Next, use groupby
+ apply
:
df.groupby('Industry', group_keys=False, sort=False).apply(
lambda x: x[:x['Threshold'].unique().item()]
).sort_index()
Or,
df.groupby('Industry', group_keys=False, sort=False).apply(
lambda x: x.head(x['Threshold'].unique().item())
).sort_index()
Company Industry ROCE ROE Threshold
0 A Metals 10 10 1
3 D IT 1 14 2
4 E IT 10 1 2
5 F banking 9 9 1
Upvotes: 1