william_grisaitis
william_grisaitis

Reputation: 5931

How do I split the level of a pandas.MultiIndex into two new levels?

I have a pandas.MultiIndex where i want to "split up" one of the levels in two:

For example:

level_1   level_2         
a1        b1/c1   
          b1/c2    
          b2/c1    

->

level_1   level_2_1  level_2_2 
a1        b1         c1     
                     c2     
          b2         c1     

what is a good, efficient way to do this?

One solution I'm aware of uses pd.MultiIndex.from_tuples:

pd.MultiIndex.from_tuples([
    (x, y[0], y[1])
    for x, y in zip(
        index.get_level_values("level_1"),
        index.get_level_values("level_2").map(lambda x: x.split("/")).map(tuple)
    )],
    names=["level_1", "level_2_1", "level_2_2"]
)

but this is super slow.

what's the best way to do this?

Upvotes: 1

Views: 1051

Answers (2)

Cameron Riddell
Cameron Riddell

Reputation: 13417

You could use the array constructor or the default constructor for the MultiIndex. The process would work like this:

  • break apart your index 'level_2' on the '/'
  • combine the 'level_1' from the old index with each part of the split index.
  • re-construct the MultiIndex

First lets construct a Series & MultiIndex to work with:

import pandas as pd

index = pd.MultiIndex.from_product(
    [['a1'], ['b1/c1', 'b1/c2', 'b2/c1']], names=['level_1', 'level_2']
)
s = pd.Series(0, index=index)

print(s)
level_1  level_2
a1       b1/c1      0
         b1/c2      0
         b2/c1      0
dtype: int64

Default MultiIndex Constructor

I haven't timed this, but my guess is that this will be the most performant solution.

split_idx = s.index.get_level_values('level_2').str.split('/', expand=True)

# combine old/new indexes here
codes = [s.index.codes[0], *split_idx.codes]
levels = [s.index.levels[0], *split_idx.levels]
names = [s.index.names[0], *[f'level_2_{i}' for i in range(1, split_idx.nlevels+1)]]

new_index = pd.MultiIndex(codes=codes, levels=levels, names=names)

print(new_index)
MultiIndex([('a1', 'b1', 'c1'),
            ('a1', 'b1', 'c2'),
            ('a1', 'b2', 'c1')],
           names=['level_1', 'level_2_1', 'level_2_2'])

print(s.set_axis(new_index))
level_1  level_2_1  level_2_2
a1       b1         c1           0
                    c2           0
         b2         c1           0
dtype: int64

MultiIndex.from_arrays Constructor

This is may be a more readable solution because the from_arrays constructor is more familiar than the default constructor.

split_idx = index.get_level_values(1).str.split('/', expand=True)

# Build new index structure, seeded with the first level of the original Index
arrays = [s.index.get_level_values(0)]
names = [s.index.names[0]]
for level in range(split_idx.nlevels):
    arrays.append(split_idx.get_level_values(level))
    names.append(f'level_2_{level+1}')

new_index = pd.MultiIndex.from_arrays(arrays, names=names)

print(new_index)
MultiIndex([('a1', 'b1', 'c1'),
            ('a1', 'b1', 'c2'),
            ('a1', 'b2', 'c1')],
           names=['level_1', 'level_2_1', 'level_2_2'])

print(s.set_axis(new_index))
level_1  level_2_1  level_2_2
a1       b1         c1           0
                    c2           0
         b2         c1           0
dtype: int64

Upvotes: 1

Frederick Zhang
Frederick Zhang

Reputation: 170

Not too sure about your exact use-case, please share more details on your input/output. I would try something like the following if the MultiIndex is on the index.

df.reset_index(inplace=True)
df[["level_2_1","level_2_2"]] = df["level_2"].str.split("/",expand=True)
df.drop(["level_2"],axis=1,inplace=True)
df.set_index(["level_1","level_2_1","level_2_2"])

hope it helps!

Upvotes: 2

Related Questions