Sort pandas DataFrame by multiple columns and duplicated index

Question

I have a pandas DataFrame with duplicated indices. There are 3 rows with each index, and they correspond to a group of items. There are two columns, a and b.

df = pandas.DataFrame([{'i': b % 4, 'a': abs(b - 6) , 'b': b}
                       for b in range(12)]).set_index('i')

I want to sort the DataFrame so that:

All of the rows with the same indices are adjacent. (all of the groups are together)
The groups are in reverse order by the lowest value of a within the group.

For example, in the above df, the first three items should be the ones with index 0, because the lowest a value for those three rows is 2, and all of the other groups have at least one row with an a value lower than 2. The second three items could be either group 3 or group 1, because the lowest a value in both of those groups is 1. The last group of items should be group 2, because it has a row with an a value of 0.

Within each group, the items are sorted in ascending order by b.

Desired output:

I've been trying something like:

df.groupby('i')[['a']].transform(min).sort(['a', 'b'], ascending=[0, 1])

But it gives me a KeyError, and it only gets that far if I make i a column instead of an index anyway.

chrisb · Accepted Answer

The most straightforward way I see is moving your index to a column, and calculating a new column with the group min.

In [43]: df = df.reset_index()

In [45]: df['group_min'] = df.groupby('i')['a'].transform('min')

Then you can sort by your conditions:

In [49]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True])
Out[49]: 
    i  a   b  group_min
0   0  6   0          2
4   0  2   4          2
8   0  2   8          2
3   3  3   3          1
7   3  1   7          1
11  3  5  11          1
1   1  5   1          1
5   1  1   5          1
9   1  3   9          1
2   2  4   2          0
6   2  0   6          0
10  2  4  10          0

To get back to your desired frame, drop the tracking variable and reset the index.

In [50]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True]).drop('group_min', axis=1).set_index('i')
Out[50]: 
   a   b
i       
0  6   0
0  2   4
0  2   8
3  3   3
3  1   7
3  5  11
1  5   1
1  1   5
1  3   9
2  4   2
2  0   6
2  4  10

Sort pandas DataFrame by multiple columns and duplicated index

Answers (2)

Related Questions