Reputation: 485
I posted this question yesterday about making a new column in a df. Now I am curious how to make a new dataframe that contains just the extreme elements. For example:
df = pd.DataFrame({'Event':['A','A','A','A', 'A' ,'B','B','B','B','B'], 'Number':[1,2,3,4,5,6,7,8,9,10],'Ref':[False,False,False,False,True,False,False,False,True,False]})
df["new"] = df.Number - df.Number[df.groupby('Event')['Ref'].transform('idxmax')].reset_index(drop=True)
print(df)
That gives the df in table 1. Now I am curious as to how I could create a new df1 that is just the rows corresponding to the largest absolute value of new. The output would be Output2 below. I know that I can utilize something like df1 = df.loc([df['new'].idxmin())
but that only gives rows. I am not sure how to loop through the different groups and also how to apply a numpy function as well. I et this is a one liner, but I am not too sure how to approach
Output 1:
Event Number Ref new
0 A 1 False -4
1 A 2 False -3
2 A 3 False -2
3 A 4 False -1
4 A 5 True 0
5 B 6 False -3
6 B 7 False -2
7 B 8 False -1
8 B 9 True 0
9 B 10 False 1
Output 2:
Event Number Ref new
0 A 1 False -4
1 B 6 False -3
Upvotes: 1
Views: 189
Reputation: 150825
Let me try answering your extended question here with a merge
new_df = pd.merge(df.loc[df['new'].abs().groupby(df['Event']).idxmax()],
df.loc[df['Ref'], ['Event','Number']],
on='Event',
suffixes=['','_ref']
)
Output:
Event Number Ref new Number_ref
0 A 1 False -4 5
1 B 6 False -3 9
Upvotes: 2