Reputation: 810
I am surprised by the behaviour of the set_index
when using a MultiIndex
.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.__version__
Out[3]: '0.19.2'
In [4]: columns = pd.MultiIndex.from_tuples([('foo', 'a'), ('foo', 'b'), ('bar', 'c')])
In [5]: df = pd.DataFrame(np.random.randint(0, 10, (3,3)), columns=columns)
In [6]: df.set_index([('bar', 'c')]).columns
Out[6]:
MultiIndex(levels=[['bar', 'foo'], ['a', 'b', 'c']],
labels=[[1, 1], [0, 1]])
Why is ('bar', 'c')
still part of the columns? It seems different from non MultiIndex
columns since, by setting the index, it disappears from the columns.
Thanks.
Upvotes: 2
Views: 1323
Reputation: 294488
Indeed it's weird. It's been known for quite some time.
Here is a [snarky] very handy work around [/snarky]
change column names by mapping it onto itself...
df.set_index([('bar', 'c')]).rename(
columns=df.columns.to_series().to_dict()).columns
MultiIndex(levels=[['foo'], ['a', 'b']],
labels=[[0, 0], [0, 1]])
Upvotes: 3