Reputation: 31
I have two multi-indexed pandas dataframes that look like this:
>>> df1 = pd.DataFrame({
... ('y1', '0'): [1, 2, 3],
... ('y2', '0'): [4, 5, 6],
... ('y11', '0'): [7, 8, 9],
... })
>>> df2 = pd.DataFrame({
... ('y1', '1'): [1.5, 2.5, 3.5],
... ('y2', '1'): [4.5, 5.5, 6.5],
... ('y11', '1'): [7.5, 8.5, 9.5],
... })
I want to concatenate them so that the result looks like:
>>> df = pd.DataFrame({
... ('y1', '0'): [1, 2, 3],
... ('y1', '1'): [1.5, 2.5, 3.5],
... ('y2', '0'): [4, 5, 6],
... ('y2', '1'): [4.5, 5.5, 6.5],
... ('y11', '0'): [7, 8, 9],
... ('y11', '1'): [7.5, 8.5, 9.5],
... })
i.e., the order of the first level of the multi-index: y1 ; y2 ; y11 , is preserved, while the second level is sensibly interleaved.
What is a solution to concatenate the two multi-indexed dataframes such that the ordering of the first level of the multi-index is preserved?
If I use:
>>> df = pd.concat((df1, df2), axis="columns").sort_index(axis="columns")
it almost works, but the ordering of the first level is messed up (lexicographically) to y1
, y11
, y2
>>> print(df)
y1 y11 y2
0 1 0 1 0 1
0 1 1.5 7 7.5 4 4.5
1 2 2.5 8 8.5 5 5.5
2 3 3.5 9 9.5 6 6.5
I can do this using a complicated regex, but I think that there should be a better solution than this.
Upvotes: 3
Views: 64
Reputation: 13437
You might also use a regular expression to manually parse your digits and sort by treating those digits as numbers. This can be passed as a key
function if you specify a level in DataFrame.sort_index
import pandas as pd
from re import match
df1 = pd.DataFrame({
('y1', '0'): [1, 2, 3],
('y2', '0'): [4, 5, 6],
('y11', '0'): [7, 8, 9],
})
df2 = pd.DataFrame({
('y1', '1'): [1.5, 2.5, 3.5],
('y2', '1'): [4.5, 5.5, 6.5],
('y11', '1'): [7.5, 8.5, 9.5],
})
df = (
pd.concat((df1, df2), axis="columns")
.sort_index(
axis='columns',
level=0,
key=lambda idx:
idx.str.extract(r'(\w)(\d+)').astype({1: int})
)
)
print(df)
# y1 y2 y11
# 0 1 0 1 0 1
# 0 1 1.5 4 4.5 7 7.5
# 1 2 2.5 5 5.5 8 8.5
# 2 3 3.5 6 6.5 9 9.5
Upvotes: 1
Reputation: 262254
One easy option could be to concat
, sort_index
, then restore the desired order using df1
:
out = (pd.concat([df1, df2], axis=1)
.sort_index(axis=1, level=0)
[df1.columns.get_level_values(0)]
)
If you can't rely on the original order and want to force a natural sort, use natsort
:
from natsort import natsorted
out = (pd.concat([df1, df2], axis=1)
.sort_index(axis=1, level=0)
[natsorted(df1.columns.get_level_values(0))]
)
Or:
from natsort import index_natsorted
out = pd.concat([df1, df2], axis=1)
out = out.iloc[:, index_natsorted(out.columns)]
## or
# out = (pd.concat([df1, df2], axis=1)
# .iloc[:, lambda x: index_natsorted(x.columns)]
# )
Output:
y1 y2 y11
0 1 0 1 0 1
0 1 1.5 4 4.5 7 7.5
1 2 2.5 5 5.5 8 8.5
2 3 3.5 6 6.5 9 9.5
Upvotes: 5