Reputation: 575
Here is the original dataframe:
Name Version Cost
A 0.0.3 1.7
C 0.0.2 2.5
A 0.0.1 1.0
C 0.0.1 2.4
B 0.0.2 3.7
B 0.0.1 3.5
A 0.0.2 1.4
C 0.0.3 2.6
B 0.0.3 3.8
After grouping and sorting within groups using the following code:
df = df.sort_values(['Name', 'Version'], ascending=[True, False])
df = df.groupby(['Name'], sort=False)
df = df.apply(lambda x: x.sort_values(['Cost'], ascending=False))
Now I have this dataframe where the Cost is sorted within group and groups ordered alphabetically.
Name Version Cost
Name
A A 0.0.3 1.7
A 0.0.2 1.4
A 0.0.1 1.0
B B 0.0.3 3.8
B 0.0.2 3.7
B 0.0.1 3.5
C C 0.0.3 2.6
C 0.0.2 2.5
C 0.0.1 2.4
Question is, now I would like to sort the groups by total cost of each group so the expected the result looks like this:
Name Version Cost
B 0.0.3 3.8
B 0.0.2 3.7
B 0.0.1 3.5
C 0.0.3 2.6
C 0.0.2 2.5
C 0.0.1 2.4
A 0.0.3 1.7
A 0.0.2 1.4
A 0.0.1 1.0
How can I achieve it with without losing the rows.
Upvotes: 1
Views: 77
Reputation: 153460
Try:
df.assign(groupsum=df.groupby(level=0)['Cost'].transform('sum'))\
.sort_values(['groupsum', 'Version'], ascending=False)
Output:
Name Version Cost groupsum
Name
B 8 B 0.0.3 3.8 11.0
4 B 0.0.2 3.7 11.0
5 B 0.0.1 3.5 11.0
C 7 C 0.0.3 2.6 7.5
1 C 0.0.2 2.5 7.5
3 C 0.0.1 2.4 7.5
A 0 A 0.0.3 1.7 4.1
6 A 0.0.2 1.4 4.1
2 A 0.0.1 1.0 4.1
And, you can add reset_index(drop=True)
at the end:
Name Version Cost groupsum
0 B 0.0.3 3.8 11.0
1 B 0.0.2 3.7 11.0
2 B 0.0.1 3.5 11.0
3 C 0.0.3 2.6 7.5
4 C 0.0.2 2.5 7.5
5 C 0.0.1 2.4 7.5
6 A 0.0.3 1.7 4.1
7 A 0.0.2 1.4 4.1
8 A 0.0.1 1.0 4.1
Or, using your "original" dataframe above:
df.assign(groupsum=df.groupby('Name')['Cost'].transform('sum'))\
.sort_values(['groupsum', 'Version'], ascending=[False,False])
Output:
Name Version Cost groupsum
8 B 0.0.3 3.8 11.0
4 B 0.0.2 3.7 11.0
5 B 0.0.1 3.5 11.0
7 C 0.0.3 2.6 7.5
1 C 0.0.2 2.5 7.5
3 C 0.0.1 2.4 7.5
0 A 0.0.3 1.7 4.1
6 A 0.0.2 1.4 4.1
2 A 0.0.1 1.0 4.1
Upvotes: 1
Reputation: 22493
You can use the key
argument in sort_values
to achieve the same result as the other answers:
print (df.sort_values("Cost", ascending=False,
key=lambda _: df.groupby("Name")["Cost"].transform("sum")))
Name Version Cost
3 B 0.0.3 3.8
4 B 0.0.2 3.7
5 B 0.0.1 3.5
6 C 0.0.3 2.6
7 C 0.0.2 2.5
8 C 0.0.1 2.4
0 A 0.0.3 1.7
1 A 0.0.2 1.4
2 A 0.0.1 1.0
Upvotes: 1
Reputation: 18306
Starting from your original dataframe, you can generate a helper column of group sums with transform
and sort according to that and also Version
column both in descending order:
group_sums = df.groupby("Name").Cost.transform("sum")
out = (df.assign(sorter=group_sums)
.sort_values(["sorter", "Version"], ascending=False, ignore_index=True)
.drop(columns="sorter"))
where we drop the helper column sorter
after sorting,
to get
>>> out
Name Version Cost
0 B 0.0.3 3.8
1 B 0.0.2 3.7
2 B 0.0.1 3.5
3 C 0.0.3 2.6
4 C 0.0.2 2.5
5 C 0.0.1 2.4
6 A 0.0.3 1.7
7 A 0.0.2 1.4
8 A 0.0.1 1.0
Upvotes: 1
Reputation: 195418
You can create temporary column and sort by it. Then drop that column:
df["tmp"] = df.groupby("Name")["Cost"].transform("sum")
df = df.sort_values(by="tmp", ascending=False).drop("tmp", 1)
print(df)
Prints:
Name Version Cost
3 B 0.0.3 3.8
4 B 0.0.2 3.7
5 B 0.0.1 3.5
6 C 0.0.3 2.6
7 C 0.0.2 2.5
8 C 0.0.1 2.4
0 A 0.0.3 1.7
1 A 0.0.2 1.4
2 A 0.0.1 1.0
df
used:
Name Version Cost
A 0.0.3 1.7
A 0.0.2 1.4
A 0.0.1 1.0
B 0.0.3 3.8
B 0.0.2 3.7
B 0.0.1 3.5
C 0.0.3 2.6
C 0.0.2 2.5
C 0.0.1 2.4
Upvotes: 1