LondonRob
LondonRob

Reputation: 78873

Pandas: Multilevel column names

pandas has support for multi-level column names:

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

This feature is very useful since it allows multiple versions of the same dataframe to be appended 'horizontally' with the 1st level of the column names (in my example instance) distinguishing the instances.

Imagine I already have a dataframe like this:

                 a         b         c
bar       0.102885  0.937838  0.907467

Is there a nice way to add another level to the column names, similar to this for row index:

x['instance'] = 'first'
x.set_level('instance',append=True)

Upvotes: 55

Views: 152561

Answers (7)

Muhammad Yasirroni
Muhammad Yasirroni

Reputation: 2167

Improving the pd.concat approach by Carl, how if we got only one row each iteration? This is not an optimized approach, but you can do something like this:

# initial
ds = []

# first iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[0])

# second iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[20, 0.34, 0.78, 0.34]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[20, 0.23, 0.26, 0.98]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[1])

# final concat
pd.concat(ds, axis=0)

Result:

first_level second_level
idx a b c a b c
10 0.89 0.98 0.31 0.29 0.63 0.99
first_level second_level
idx a b c a b c
20 0.34 0.78 0.34 0.23 0.26 0.98
first_level second_level
idx a b c a b c
10 0.89 0.98 0.31 0.29 0.63 0.99
20 0.34 0.78 0.34 0.23 0.26 0.98

Upvotes: 1

Raj_Ame09
Raj_Ame09

Reputation: 140

x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)

A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A

This is the most simple and easy way to create MultiLevel columns and rows.

enter image description here

Upvotes: 6

Ian Zurutuza
Ian Zurutuza

Reputation: 628

No need to create a list of tuples

Use: pd.MultiIndex.from_product(iterables)

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])

Resultant DataFrame:

  new_label                    
          a         b         c
0   0.25999  0.337535  0.333568

Pull request from Jan 25, 2014

Upvotes: 24

Keith
Keith

Reputation: 154

A lot of these solutions seem just a bit more complex than they need to be.

I prefer to make things look as simple and intuitive as possible when speed isn't absolutely necessary. I think this solution accomplishes that. Tested in versions of pandas as early as 0.22.0.

Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.

In [1]: import pandas as pd                                                                                                                                                                                          

In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              

In [3]: df                                                                                                                                                                                                           
Out[3]: 
   0  1  2  3
0  1  1  1  1
1  2  2  2  2

In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    

In [5]: df                                                                                                                                                                                                           
Out[5]: 
   a  c  e  g
   b  d  f  h
0  1  1  1  1
1  2  2  2  2

Upvotes: 7

Charl
Charl

Reputation: 41

Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.

def create_tuple_for_for_columns(df_a, multi_level_col):
    """
    Create a columns tuple that can be pandas MultiIndex to create multi level column

    :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
    :param multi_level_col: name of second level column
    :return: tuple containing (second_level_col, firs_level_cols)
    """
    temp_columns = []
    for item in df_a.columns:
        temp_columns.append((multi_level_col, item))
    return temp_columns

It can be used like this:

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)

Upvotes: 1

Carl
Carl

Reputation: 998

You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
Out[48]:
    first_level
              a     b     c
idx
10         0.89  0.98  0.31
20         0.34  0.78  0.34

You can use the same technique to create multiple levels.

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
Out[50]:
    first_level             second_level
              a     b     c            a     b     c
idx
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98

Upvotes: 21

user3377361
user3377361

Reputation: 621

Try this:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','a'),('c','b')]

df.columns=pd.MultiIndex.from_tuples(columns)

Upvotes: 62

Related Questions