Reputation: 78873
pandas
has support for multi-level column names:
>>> x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance first
foo a b c
bar 0.102885 0.937838 0.907467
This feature is very useful since it allows multiple versions of the same dataframe to be appended 'horizontally' with the 1st level of the column names (in my example instance
) distinguishing the instances.
Imagine I already have a dataframe like this:
a b c
bar 0.102885 0.937838 0.907467
Is there a nice way to add another level to the column names, similar to this for row index:
x['instance'] = 'first'
x.set_level('instance',append=True)
Upvotes: 55
Views: 152561
Reputation: 2167
Improving the pd.concat
approach by Carl, how if we got only one row each iteration? This is not an optimized approach, but you can do something like this:
# initial
ds = []
# first iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.89, 0.98, 0.31]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.29, 0.63, 0.99]]).set_index('idx')
ds.append(pd.concat(d, axis=1))
# display(ds[0])
# second iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[20, 0.34, 0.78, 0.34]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[20, 0.23, 0.26, 0.98]]).set_index('idx')
ds.append(pd.concat(d, axis=1))
# display(ds[1])
# final concat
pd.concat(ds, axis=0)
Result:
first_level | second_level | |||||
---|---|---|---|---|---|---|
idx | a | b | c | a | b | c |
10 | 0.89 | 0.98 | 0.31 | 0.29 | 0.63 | 0.99 |
first_level | second_level | |||||
---|---|---|---|---|---|---|
idx | a | b | c | a | b | c |
20 | 0.34 | 0.78 | 0.34 | 0.23 | 0.26 | 0.98 |
first_level | second_level | |||||
---|---|---|---|---|---|---|
idx | a | b | c | a | b | c |
10 | 0.89 | 0.98 | 0.31 | 0.29 | 0.63 | 0.99 |
20 | 0.34 | 0.78 | 0.34 | 0.23 | 0.26 | 0.98 |
Upvotes: 1
Reputation: 140
x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)
A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A
This is the most simple and easy way to create MultiLevel columns and rows.
Upvotes: 6
Reputation: 628
No need to create a list of tuples
Use: pd.MultiIndex.from_product(iterables)
import pandas as pd
import numpy as np
df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])
Resultant DataFrame:
new_label
a b c
0 0.25999 0.337535 0.333568
Pull request from Jan 25, 2014
Upvotes: 24
Reputation: 154
A lot of these solutions seem just a bit more complex than they need to be.
I prefer to make things look as simple and intuitive as possible when speed isn't absolutely necessary. I think this solution accomplishes that.
Tested in versions of pandas as early as 0.22.0
.
Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])
In [3]: df
Out[3]:
0 1 2 3
0 1 1 1 1
1 2 2 2 2
In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]
In [5]: df
Out[5]:
a c e g
b d f h
0 1 1 1 1
1 2 2 2 2
Upvotes: 7
Reputation: 41
Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.
def create_tuple_for_for_columns(df_a, multi_level_col):
"""
Create a columns tuple that can be pandas MultiIndex to create multi level column
:param df_a: pandas dataframe containing the columns that must form the first level of the multi index
:param multi_level_col: name of second level column
:return: tuple containing (second_level_col, firs_level_cols)
"""
temp_columns = []
for item in df_a.columns:
temp_columns.append((multi_level_col, item))
return temp_columns
It can be used like this:
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)
Upvotes: 1
Reputation: 998
You can use concat
. Give it a dictionary of dataframes where the key is the new column level you want to add.
In [46]: d = {}
In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.89, 0.98, 0.31],
[20, 0.34, 0.78, 0.34]]).set_index('idx')
In [48]: pd.concat(d, axis=1)
Out[48]:
first_level
a b c
idx
10 0.89 0.98 0.31
20 0.34 0.78 0.34
You can use the same technique to create multiple levels.
In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.29, 0.63, 0.99],
[20, 0.23, 0.26, 0.98]]).set_index('idx')
In [50]: pd.concat(d, axis=1)
Out[50]:
first_level second_level
a b c a b c
idx
10 0.89 0.98 0.31 0.29 0.63 0.99
20 0.34 0.78 0.34 0.23 0.26 0.98
Upvotes: 21
Reputation: 621
Try this:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns=[('c','a'),('c','b')]
df.columns=pd.MultiIndex.from_tuples(columns)
Upvotes: 62