Reputation: 1488
I have a Pandas DataFrame which contains tracks, score and some other columns.
I want to group by on "tracks" and then sorting of those groups based on the maximum value on the "score".
Example DataFrame:
tracks score
20 2.2
20 1.5
25 3.5
24 1.2
24 5.5
Expected Output (I want to compare the highest values from each group and sort all the group from highest to lowest, though I don't want to lose any other data - means I want to show all the rows):
tracks score
24 5.5
1.2
25 3.5
20 2.2
1.5
Currently, I am getting following output (my score is sorted but after group by my tracks are sorted based on the track numbers):
tracks score
20 2.2
1.5
24 5.5
4.2
25 3.5
My Approach till now: 1. I've sorted all the values by the score
sub_df = sub_df.sort_values("score")
url_dict = sub_df.groupby('track')['url'].apply(list).to_dict()
I also tried to used OrderedDict but it is of no use (at least for now) as the groupBy command is sending the wrong sequence of data.
Pandas = 0.23, Python = 3.6.4
Upvotes: 4
Views: 6299
Reputation: 100
may be a bit long but this is what i use:
first set the index:
df.set_index(['x', 'y'], inplace=True)
then use the groupby on a double sorted dataframe based on the index
new_df = df.groupby(level=[0,1]).sum().reset_index() \
.sort_values(['x', 'y'], ascending=[1,0]).groupby('x').head(1)
this will show only the max value, if you want to see all, remove the "head(1)".
Upvotes: 2
Reputation: 862511
Create helper column by GroupBy.transform
and sort by multiple columns by DataFrame.sort_values
, last remove helper column:
sub_df['max'] = sub_df.groupby('tracks')['score'].transform('max')
sub_df = sub_df.sort_values(["max","score"], ascending=False).drop('max', axis=1)
#if necessary sorting also by tracks column
#sub_df = sub_df.sort_values(["max","tracks","score"], ascending=False).drop('max', axis=1)
print (sub_df)
tracks score
4 24 5.5
3 24 1.2
2 25 3.5
0 20 2.2
1 20 1.5
Upvotes: 5