Reputation: 14498
I have two (hundreds) df's that are generated and then concatenated, which I would then like to sort while keeping the rows with identical column D
names in the original order:
In [120]: df_list[0]
Out[120]:
A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
In [121]: df_list[1]
Out[121]:
A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
In [122]: df = pd.concat(df_list[0:2])
In [122]: df
Out[122]:
A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
Standard sorting produces:
In [125]: df.sort_values('A',ascending=False)
Out[125]:
A B C D
0 0.628844 0.614492 0.570333 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.317790 0.293189 0.239368 MA1002
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
However, I would like to sort on A
and keep the row-groupings as specified by D
. This is the desired output:
A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
Do I need to work with groupby
, or is there another sorting/grouping technique I am unfamiliar with?
Upvotes: 4
Views: 195
Reputation: 294218
Use the keys
argument in pd.concat
keys = [(df.A.iloc[0], i) for i, df in enumerate(list_of_dfs)]
pd.concat(list_of_dfs, keys=keys) \
.sort_index(ascending=[False, True, True]) \
.reset_index(drop=True)
Upvotes: 2