AGS
AGS

Reputation: 14498

Sort by one column's values, keeping rows grouped by another column's value

I have two (hundreds) df's that are generated and then concatenated, which I would then like to sort while keeping the rows with identical column D names in the original order:

In [120]: df_list[0]
Out[120]:

   A         B         C         D
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835

In [121]: df_list[1]
Out[121]:

   A         B         C         D
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

In [122]: df = pd.concat(df_list[0:2])

In [122]: df
Out[122]:
   A         B         C         D
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

Standard sorting produces:

In [125]: df.sort_values('A',ascending=False)
Out[125]:
   A         B         C         D
0  0.628844  0.614492  0.570333  MA1002
0  0.564678  0.598355  0.606693  MA0835
1  0.317790  0.293189  0.239368  MA1002
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

However, I would like to sort on A and keep the row-groupings as specified by D. This is the desired output:

   A         B         C         D   
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835

Do I need to work with groupby, or is there another sorting/grouping technique I am unfamiliar with?

Upvotes: 4

Views: 195

Answers (1)

piRSquared
piRSquared

Reputation: 294218

Use the keys argument in pd.concat

keys = [(df.A.iloc[0], i) for i, df in enumerate(list_of_dfs)]
pd.concat(list_of_dfs, keys=keys) \
    .sort_index(ascending=[False, True, True]) \
    .reset_index(drop=True)

enter image description here

Upvotes: 2

Related Questions