yankeefan11
yankeefan11

Reputation: 485

Creating new DataFrame based on old groupby

I posted this question yesterday about making a new column in a df. Now I am curious how to make a new dataframe that contains just the extreme elements. For example:

df = pd.DataFrame({'Event':['A','A','A','A', 'A' ,'B','B','B','B','B'],  'Number':[1,2,3,4,5,6,7,8,9,10],'Ref':[False,False,False,False,True,False,False,False,True,False]})
df["new"] = df.Number - df.Number[df.groupby('Event')['Ref'].transform('idxmax')].reset_index(drop=True)
print(df)

That gives the df in table 1. Now I am curious as to how I could create a new df1 that is just the rows corresponding to the largest absolute value of new. The output would be Output2 below. I know that I can utilize something like df1 = df.loc([df['new'].idxmin()) but that only gives rows. I am not sure how to loop through the different groups and also how to apply a numpy function as well. I et this is a one liner, but I am not too sure how to approach

Output 1:

  Event  Number    Ref  new
0     A       1  False   -4
1     A       2  False   -3
2     A       3  False   -2
3     A       4  False   -1
4     A       5   True    0
5     B       6  False   -3
6     B       7  False   -2
7     B       8  False   -1
8     B       9   True    0
9     B      10  False    1

Output 2:

  Event  Number    Ref  new
0     A       1  False   -4
1     B       6  False   -3

Upvotes: 1

Views: 189

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

Let me try answering your extended question here with a merge

new_df = pd.merge(df.loc[df['new'].abs().groupby(df['Event']).idxmax()],
         df.loc[df['Ref'], ['Event','Number']],
         on='Event',
         suffixes=['','_ref']
        )

Output:

  Event  Number    Ref  new  Number_ref
0     A       1  False   -4           5
1     B       6  False   -3           9

Upvotes: 2

Related Questions