Sai-Aakash Ramesh
Sai-Aakash Ramesh

Reputation: 31

How to preserve order of columns when concatenating Multi-Indexed dataframes?

I have two multi-indexed pandas dataframes that look like this:

>>> df1 = pd.DataFrame({
...     ('y1', '0'): [1, 2, 3],
...     ('y2', '0'): [4, 5, 6],
...     ('y11', '0'): [7, 8, 9],
... })

>>> df2 = pd.DataFrame({
...     ('y1', '1'): [1.5, 2.5, 3.5],
...     ('y2', '1'): [4.5, 5.5, 6.5],
...     ('y11', '1'): [7.5, 8.5, 9.5],
... })

I want to concatenate them so that the result looks like:

>>> df = pd.DataFrame({
...    ('y1', '0'): [1, 2, 3],
...    ('y1', '1'): [1.5, 2.5, 3.5],
...    ('y2', '0'): [4, 5, 6],
...    ('y2', '1'): [4.5, 5.5, 6.5],
...    ('y11', '0'): [7, 8, 9],
...    ('y11', '1'): [7.5, 8.5, 9.5],
... })

i.e., the order of the first level of the multi-index: y1 ; y2 ; y11 , is preserved, while the second level is sensibly interleaved.

What is a solution to concatenate the two multi-indexed dataframes such that the ordering of the first level of the multi-index is preserved?

If I use:

>>> df = pd.concat((df1, df2), axis="columns").sort_index(axis="columns")

it almost works, but the ordering of the first level is messed up (lexicographically) to y1 , y11 , y2

>>> print(df)
  y1      y11      y2     
   0    1   0    1  0    1
0  1  1.5   7  7.5  4  4.5
1  2  2.5   8  8.5  5  5.5
2  3  3.5   9  9.5  6  6.5

I can do this using a complicated regex, but I think that there should be a better solution than this.

Upvotes: 3

Views: 64

Answers (2)

Cameron Riddell
Cameron Riddell

Reputation: 13437

You might also use a regular expression to manually parse your digits and sort by treating those digits as numbers. This can be passed as a key function if you specify a level in DataFrame.sort_index

import pandas as pd
from re import match

df1 = pd.DataFrame({
    ('y1', '0'): [1, 2, 3],
    ('y2', '0'): [4, 5, 6],
    ('y11', '0'): [7, 8, 9],
})

df2 = pd.DataFrame({
    ('y1', '1'): [1.5, 2.5, 3.5],
    ('y2', '1'): [4.5, 5.5, 6.5],
    ('y11', '1'): [7.5, 8.5, 9.5],
})

df = (
    pd.concat((df1, df2), axis="columns")
    .sort_index(
        axis='columns',
        level=0,
        key=lambda idx:
            idx.str.extract(r'(\w)(\d+)').astype({1: int})
    )
)

print(df)
#   y1      y2      y11
#    0    1  0    1   0    1
# 0  1  1.5  4  4.5   7  7.5
# 1  2  2.5  5  5.5   8  8.5
# 2  3  3.5  6  6.5   9  9.5

Upvotes: 1

mozway
mozway

Reputation: 262254

One easy option could be to concat, sort_index, then restore the desired order using df1:

out = (pd.concat([df1, df2], axis=1)
          .sort_index(axis=1, level=0)
       [df1.columns.get_level_values(0)]
      )

If you can't rely on the original order and want to force a natural sort, use natsort:

from natsort import natsorted

out = (pd.concat([df1, df2], axis=1)
          .sort_index(axis=1, level=0)
       [natsorted(df1.columns.get_level_values(0))]
      )

Or:

from natsort import index_natsorted

out = pd.concat([df1, df2], axis=1)
out = out.iloc[:, index_natsorted(out.columns)]

## or
# out = (pd.concat([df1, df2], axis=1)
#          .iloc[:, lambda x: index_natsorted(x.columns)]
#       )

Output:

  y1      y2      y11     
   0    1  0    1   0    1
0  1  1.5  4  4.5   7  7.5
1  2  2.5  5  5.5   8  8.5
2  3  3.5  6  6.5   9  9.5

Upvotes: 5

Related Questions