Flavien Lambert
Flavien Lambert

Reputation: 810

Pandas set_index with Multiiindex and columns

I am surprised by the behaviour of the set_index when using a MultiIndex.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.__version__
Out[3]: '0.19.2'

In [4]: columns = pd.MultiIndex.from_tuples([('foo', 'a'), ('foo', 'b'), ('bar', 'c')])

In [5]: df = pd.DataFrame(np.random.randint(0, 10, (3,3)), columns=columns)

In [6]: df.set_index([('bar', 'c')]).columns
Out[6]: 
MultiIndex(levels=[['bar', 'foo'], ['a', 'b', 'c']],
           labels=[[1, 1], [0, 1]])

Why is ('bar', 'c') still part of the columns? It seems different from non MultiIndex columns since, by setting the index, it disappears from the columns.

Thanks.

Upvotes: 2

Views: 1323

Answers (1)

piRSquared
piRSquared

Reputation: 294488

Indeed it's weird. It's been known for quite some time.

Here is a [snarky] very handy work around [/snarky]

change column names by mapping it onto itself...

df.set_index([('bar', 'c')]).rename(
    columns=df.columns.to_series().to_dict()).columns

MultiIndex(levels=[['foo'], ['a', 'b']],
           labels=[[0, 0], [0, 1]])

Upvotes: 3

Related Questions