TheChymera
TheChymera

Reputation: 17924

Turn Pandas Multi-Index into column

I have a dataframe with 2 index levels:

                         value
Trial    measurement
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

Which I want to turn into this:

Trial    measurement       value

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

How can I best do this?

I need this because I want to aggregate the data as instructed here, but I can't select my columns like that if they are in use as indices.

Upvotes: 273

Views: 264764

Answers (8)

kho
kho

Reputation: 1291

Similar to Alex solution in a more generalized form. It keeps the indexes untouched and adds the index levels as new columns with its name.

for i in df.index.names:
    df[i] = df.index.get_level_values(i)

which gives the new columns 'Trial' and 'measurement'

                   value Trial    measurement
Trial measurement             
    1           0     13     1              0     
                1      3     1              1     
                2      4     1              2     
  ...  

Upvotes: 1

Rafal Plaza
Rafal Plaza

Reputation: 11

A solution that might be helpful in cases when not every column has multiple index levels:

df.columns = df.columns.map(''.join)

Upvotes: 1

whitetiger1399
whitetiger1399

Reputation: 643

Short and simple

df2 = pd.DataFrame({'test_col': df['test_col'].describe()})
df2 = df2.reset_index()

Upvotes: 3

CraigSF
CraigSF

Reputation: 3422

The reset_index() is a pandas DataFrame method that will transfer index values into the DataFrame as columns. The default setting for the parameter is drop=False (which will keep the index values as columns).

All you have to do call .reset_index() after the name of the DataFrame:

df = df.reset_index()  

Upvotes: 339

Alex
Alex

Reputation: 3431

There may be situations when df.reset_index() cannot be used (e.g., when you need the index, too). In this case, use index.get_level_values() to access index values directly:

df['Trial'] = df.index.get_level_values(0)
df['measurement'] = df.index.get_level_values(1)

This will assign index values to individual columns and keep the index.

See the docs for further info.

Upvotes: 25

kevin_theinfinityfund
kevin_theinfinityfund

Reputation: 2157

I ran into Karl's issue as well. I just found myself renaming the aggregated column then resetting the index.

df = pd.DataFrame(df.groupby(['arms', 'success'])['success'].sum()).rename(columns={'success':'sum'})

enter image description here

df = df.reset_index()

enter image description here

Upvotes: 5

Karl Anka
Karl Anka

Reputation: 2849

This doesn't really apply to your case but could be helpful for others (like myself 5 minutes ago) to know. If one's multindex have the same name like this:

                         value
Trial        Trial
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

df.reset_index(inplace=True) will fail, cause the columns that are created cannot have the same names.

So then you need to rename the multindex with df.index = df.index.set_names(['Trial', 'measurement']) to get:

                           value
Trial    measurement       

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

And then df.reset_index(inplace=True) will work like a charm.

I encountered this problem after grouping by year and month on a datetime-column(not index) called live_date, which meant that both year and month were named live_date.

Upvotes: 39

sameagol
sameagol

Reputation: 633

As @cs95 mentioned in a comment, to drop only one level, use:

df.reset_index(level=[...])

This avoids having to redefine your desired index after reset.

Upvotes: 19

Related Questions