CLS
CLS

Reputation: 47

Return top n rows based on threshold from pandas dataframe

Here is my input dataframe:

df = pd.DataFrame({'Company':['A','B','C','D','E','F'],'Industry':['Metals','Metals','IT','IT','IT','banking'],'ROE':[10,9,5,14,1,9],'ROCE':[10,5,5,1,10,9],'Threshold':[1,1,2,2,2,1]});df

Need output as follows:

dfout = pd.DataFrame({'Company':['A','D','E','F'],'Industry':['Metals','IT','IT','banking'],'ROE':[10,14,1,9],'ROCE':[10,1,10,9],'Threshold':[1,2,2,1]});dfout

Logic: To fetch rows with top 'n' ROE and ROCE per industry. N is 'Threshold' column in the dataframe. Appreciate your inputs on this. Thank you.

Upvotes: 0

Views: 156

Answers (1)

cs95
cs95

Reputation: 402283

First, sort your data by ROE/ROCE:

df = df.iloc[(-np.maximum(df.ROCE, df.ROE)).argsort()]

Next, use groupby + apply:

df.groupby('Industry', group_keys=False, sort=False).apply(
    lambda x: x[:x['Threshold'].unique().item()]
).sort_index()

Or,

df.groupby('Industry', group_keys=False, sort=False).apply(
    lambda x: x.head(x['Threshold'].unique().item())
).sort_index()

  Company Industry  ROCE  ROE  Threshold
0       A   Metals    10   10          1
3       D       IT     1   14          2
4       E       IT    10    1          2
5       F  banking     9    9          1

Upvotes: 1

Related Questions