camerond12
camerond12

Reputation: 71

Get column value corresponding to the idxmax from another column per group

I have a dataframe consisting of 3 columns and n rows.

My dataframe before grouping looks like

Index    Max_Mass (kg/m)    Max_Diameter (m)
1             10                   1
2             20                   2
3             30                   3

200           5                    4
201           60                   3
202           20                   2

300           90                   1
301           3                    1
302           10                   1

400           100                  1
401           10                   1
402           10                   1

I group the dataframe by cutting it every 100 rows so that I can find the maximum value of a specific column every 100th row using:

groups = output_df.groupby(pd.cut(output_df.index, range(0,len(output_df), 100)))

I am using the below to find the maximums of column 'Max Mass (kg/m)':

groups.max()['Max Mass (kg/m)']

I now want to make another df which will contain the max value found and the index of that value. How do I retrieve the index? I have tried using the below, but from my understanding, it only works for a single value, while the line above returns me a column of all maximums.

(groups.max()['Max Mass (kg/m)']).getidx()

My expected output (for the DataFrame above) would be

My new dataframe that I want to create should look like;

Index    Max_Mass (kg/m)    Max_Diameter (m)
3             30                   3
201           60                   3
300           90                   1
400           100                  1

Upvotes: 3

Views: 2417

Answers (2)

cs95
cs95

Reputation: 402814

Comments inlined.

# Initialise the grouper.
grouper = df.Index // 100
# Get list of indices corresponding to the max using `apply`.
idx = df.groupby(grouper).apply(
          lambda x: x.set_index('Index')['Max_Mass (kg/m)'].idxmax())
# Compute the max and update the other columns based on `idx` computed previously.
v = df.groupby(grouper, as_index=False)['Max_Mass (kg/m)'].max()
v['Index'] = idx.values
v['Max_Diameter (m)'] = df.loc[df.Index.isin(v.Index), 'Max_Diameter (m)'].values

print(v)
   Max_Mass (kg/m)  Index  Max_Diameter (m)
0               30      3                 3
1               60    201                 3
2               90    300                 1
3              100    400                 1

Upvotes: 1

slayer
slayer

Reputation: 652

Instead of using groups.max() you can use groups.idxmax(). Then use the index to get the max values. Now you have everything you need.

Upvotes: 0

Related Questions