disp_name
disp_name

Reputation: 1488

Pandas: GroupBy and Order Groups based on max value in each group

I have a Pandas DataFrame which contains tracks, score and some other columns.

I want to group by on "tracks" and then sorting of those groups based on the maximum value on the "score".

Example DataFrame:

tracks       score
20            2.2
20            1.5
25            3.5
24            1.2
24            5.5

Expected Output (I want to compare the highest values from each group and sort all the group from highest to lowest, though I don't want to lose any other data - means I want to show all the rows):

tracks       score
24            5.5
              1.2
25            3.5
20            2.2
              1.5

Currently, I am getting following output (my score is sorted but after group by my tracks are sorted based on the track numbers):

    tracks       score
20            2.2
              1.5
24            5.5
              4.2
25            3.5

My Approach till now: 1. I've sorted all the values by the score

sub_df = sub_df.sort_values("score")
  1. Then I am doing the following to get the output (I need in dictionary format):

url_dict = sub_df.groupby('track')['url'].apply(list).to_dict()

I also tried to used OrderedDict but it is of no use (at least for now) as the groupBy command is sending the wrong sequence of data.

Pandas = 0.23, Python = 3.6.4

Upvotes: 4

Views: 6299

Answers (2)

R.V
R.V

Reputation: 100

may be a bit long but this is what i use:

first set the index:

df.set_index(['x', 'y'], inplace=True)

then use the groupby on a double sorted dataframe based on the index

new_df = df.groupby(level=[0,1]).sum().reset_index() \
          .sort_values(['x', 'y'], ascending=[1,0]).groupby('x').head(1)

this will show only the max value, if you want to see all, remove the "head(1)".

Upvotes: 2

jezrael
jezrael

Reputation: 862511

Create helper column by GroupBy.transform and sort by multiple columns by DataFrame.sort_values, last remove helper column:

sub_df['max'] = sub_df.groupby('tracks')['score'].transform('max')

sub_df = sub_df.sort_values(["max","score"], ascending=False).drop('max', axis=1)
#if necessary sorting also by tracks column
#sub_df = sub_df.sort_values(["max","tracks","score"], ascending=False).drop('max', axis=1)
print (sub_df)
   tracks  score
4      24    5.5
3      24    1.2
2      25    3.5
0      20    2.2
1      20    1.5

Upvotes: 5

Related Questions