Reputation: 5931
I have a pandas.MultiIndex
where i want to "split up" one of the levels in two:
For example:
level_1 level_2
a1 b1/c1
b1/c2
b2/c1
->
level_1 level_2_1 level_2_2
a1 b1 c1
c2
b2 c1
what is a good, efficient way to do this?
One solution I'm aware of uses pd.MultiIndex.from_tuples
:
pd.MultiIndex.from_tuples([
(x, y[0], y[1])
for x, y in zip(
index.get_level_values("level_1"),
index.get_level_values("level_2").map(lambda x: x.split("/")).map(tuple)
)],
names=["level_1", "level_2_1", "level_2_2"]
)
but this is super slow.
what's the best way to do this?
Upvotes: 1
Views: 1051
Reputation: 13417
You could use the array constructor or the default constructor for the MultiIndex
. The process would work like this:
'level_2'
on the '/'
'level_1'
from the old index with each part of the split index.MultiIndex
First lets construct a Series
& MultiIndex
to work with:
import pandas as pd
index = pd.MultiIndex.from_product(
[['a1'], ['b1/c1', 'b1/c2', 'b2/c1']], names=['level_1', 'level_2']
)
s = pd.Series(0, index=index)
print(s)
level_1 level_2
a1 b1/c1 0
b1/c2 0
b2/c1 0
dtype: int64
MultiIndex
ConstructorI haven't timed this, but my guess is that this will be the most performant solution.
split_idx = s.index.get_level_values('level_2').str.split('/', expand=True)
# combine old/new indexes here
codes = [s.index.codes[0], *split_idx.codes]
levels = [s.index.levels[0], *split_idx.levels]
names = [s.index.names[0], *[f'level_2_{i}' for i in range(1, split_idx.nlevels+1)]]
new_index = pd.MultiIndex(codes=codes, levels=levels, names=names)
print(new_index)
MultiIndex([('a1', 'b1', 'c1'),
('a1', 'b1', 'c2'),
('a1', 'b2', 'c1')],
names=['level_1', 'level_2_1', 'level_2_2'])
print(s.set_axis(new_index))
level_1 level_2_1 level_2_2
a1 b1 c1 0
c2 0
b2 c1 0
dtype: int64
MultiIndex.from_arrays
ConstructorThis is may be a more readable solution because the from_arrays
constructor is more familiar than the default constructor.
split_idx = index.get_level_values(1).str.split('/', expand=True)
# Build new index structure, seeded with the first level of the original Index
arrays = [s.index.get_level_values(0)]
names = [s.index.names[0]]
for level in range(split_idx.nlevels):
arrays.append(split_idx.get_level_values(level))
names.append(f'level_2_{level+1}')
new_index = pd.MultiIndex.from_arrays(arrays, names=names)
print(new_index)
MultiIndex([('a1', 'b1', 'c1'),
('a1', 'b1', 'c2'),
('a1', 'b2', 'c1')],
names=['level_1', 'level_2_1', 'level_2_2'])
print(s.set_axis(new_index))
level_1 level_2_1 level_2_2
a1 b1 c1 0
c2 0
b2 c1 0
dtype: int64
Upvotes: 1
Reputation: 170
Not too sure about your exact use-case, please share more details on your input/output. I would try something like the following if the MultiIndex is on the index.
df.reset_index(inplace=True)
df[["level_2_1","level_2_2"]] = df["level_2"].str.split("/",expand=True)
df.drop(["level_2"],axis=1,inplace=True)
df.set_index(["level_1","level_2_1","level_2_2"])
hope it helps!
Upvotes: 2