Reputation: 575

Sort the Dataframe groups while retaining original rows

Here is the original dataframe:

Name Version Cost
A    0.0.3   1.7
C    0.0.2   2.5
A    0.0.1   1.0
C    0.0.1   2.4
B    0.0.2   3.7
B    0.0.1   3.5
A    0.0.2   1.4
C    0.0.3   2.6
B    0.0.3   3.8

After grouping and sorting within groups using the following code:

df = df.sort_values(['Name', 'Version'], ascending=[True, False])
df = df.groupby(['Name'], sort=False)
df = df.apply(lambda x: x.sort_values(['Cost'], ascending=False))

Now I have this dataframe where the Cost is sorted within group and groups ordered alphabetically.

     Name Version Cost
Name
A    A    0.0.3   1.7
     A    0.0.2   1.4
     A    0.0.1   1.0
B    B    0.0.3   3.8
     B    0.0.2   3.7
     B    0.0.1   3.5
C    C    0.0.3   2.6
     C    0.0.2   2.5
     C    0.0.1   2.4

Question is, now I would like to sort the groups by total cost of each group so the expected the result looks like this:

Name Version Cost
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0

How can I achieve it with without losing the rows.

Upvotes: 1

Answers (4)

Scott Boston

Reputation: 153460

Try:

df.assign(groupsum=df.groupby(level=0)['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=False)

Output:

       Name Version  Cost  groupsum
Name                               
B    8    B   0.0.3   3.8      11.0
     4    B   0.0.2   3.7      11.0
     5    B   0.0.1   3.5      11.0
C    7    C   0.0.3   2.6       7.5
     1    C   0.0.2   2.5       7.5
     3    C   0.0.1   2.4       7.5
A    0    A   0.0.3   1.7       4.1
     6    A   0.0.2   1.4       4.1
     2    A   0.0.1   1.0       4.1

And, you can add reset_index(drop=True) at the end:

  Name Version  Cost  groupsum
0    B   0.0.3   3.8      11.0
1    B   0.0.2   3.7      11.0
2    B   0.0.1   3.5      11.0
3    C   0.0.3   2.6       7.5
4    C   0.0.2   2.5       7.5
5    C   0.0.1   2.4       7.5
6    A   0.0.3   1.7       4.1
7    A   0.0.2   1.4       4.1
8    A   0.0.1   1.0       4.1

Or, using your "original" dataframe above:

df.assign(groupsum=df.groupby('Name')['Cost'].transform('sum'))\
  .sort_values(['groupsum', 'Version'], ascending=[False,False])

Output:

  Name Version  Cost  groupsum
8    B   0.0.3   3.8      11.0
4    B   0.0.2   3.7      11.0
5    B   0.0.1   3.5      11.0
7    C   0.0.3   2.6       7.5
1    C   0.0.2   2.5       7.5
3    C   0.0.1   2.4       7.5
0    A   0.0.3   1.7       4.1
6    A   0.0.2   1.4       4.1
2    A   0.0.1   1.0       4.1

Upvotes: 1

Henry Yik

Reputation: 22493

You can use the key argument in sort_values to achieve the same result as the other answers:

print (df.sort_values("Cost", ascending=False,
                      key=lambda _: df.groupby("Name")["Cost"].transform("sum")))

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

Upvotes: 1

Mustafa Aydın

Reputation: 18306

Starting from your original dataframe, you can generate a helper column of group sums with transform and sort according to that and also Version column both in descending order:

group_sums = df.groupby("Name").Cost.transform("sum")
out = (df.assign(sorter=group_sums)
         .sort_values(["sorter", "Version"], ascending=False, ignore_index=True)
         .drop(columns="sorter"))

where we drop the helper column sorter after sorting,

to get

>>> out

  Name Version  Cost
0    B   0.0.3   3.8
1    B   0.0.2   3.7
2    B   0.0.1   3.5
3    C   0.0.3   2.6
4    C   0.0.2   2.5
5    C   0.0.1   2.4
6    A   0.0.3   1.7
7    A   0.0.2   1.4
8    A   0.0.1   1.0

Upvotes: 1

Andrej Kesely

Reputation: 195418

You can create temporary column and sort by it. Then drop that column:

df["tmp"] = df.groupby("Name")["Cost"].transform("sum")
df = df.sort_values(by="tmp", ascending=False).drop("tmp", 1)
print(df)

Prints:

  Name Version  Cost
3    B   0.0.3   3.8
4    B   0.0.2   3.7
5    B   0.0.1   3.5
6    C   0.0.3   2.6
7    C   0.0.2   2.5
8    C   0.0.1   2.4
0    A   0.0.3   1.7
1    A   0.0.2   1.4
2    A   0.0.1   1.0

df used:

Name Version Cost
A    0.0.3   1.7
A    0.0.2   1.4
A    0.0.1   1.0
B    0.0.3   3.8
B    0.0.2   3.7
B    0.0.1   3.5
C    0.0.3   2.6
C    0.0.2   2.5
C    0.0.1   2.4

Upvotes: 1

Sort the Dataframe groups while retaining original rows

Answers (4)

Related Questions