John
John

Reputation: 43199

Make multiindex columns in a pandas dataframe

I have a pandas dataframe with the following strcuture:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(32).reshape((4,8)), 
            index = pd.date_range('2016-01-01', periods=4),
            columns=['male ; 0', 'male ; 1','male ; 2','male ; 4','female ; 0','female ; 1','female ; 2','female ; 3',])

The column names are messy with a combination of two variable in the header name, and residual punctuation from the original spreadsheet.

What I want to do is set a column MultiIndex called sex and age in my dataframe.

I tried using pd.MultiIndex.from_tuples like this:

columns = [('Male', 0),('Male', 1),('Male', 2),('Male', 3),('Female', 0),('Female', 1),('Female', 2),('Female', 3)]
df.columns = pd.MultiIndex.from_tuples(columns)

And then naming the column indexes:

df.columns.names = ['Sex', 'Age']

This gives the result that I would like. However, my dataframes has ages to over 100 for each sex so this is not very practical.

Could someone please guide me on how to set MultiIndex columns from a tuple programatically.

Upvotes: 21

Views: 41156

Answers (4)

Markus Dutschke
Markus Dutschke

Reputation: 10606

generate multiindex df from dict

This is especially convenient, if the multiindex columns can not be generated by a combinatoric operation.

>>> import pandas as pd
>>> pd.DataFrame({("Male", 0): [1, 2], ("Male", 1): [3, 4], ("Female", 0): [5, 6], ("Female", "..."): [7, 8]})
  Male    Female    
     0  1      0 ...
0    1  3      5   7
1    2  4      6   8

If you want to set the column names of the df as well, use

>>> import pandas as pd
>>> df = pd.DataFrame({("Male", 0): [1, 2], ("Male", 1): [3, 4], ("Female", 0): [5, 6], ("Female", "..."): [7, 8]})
>>> df.columns.names = ['Sex', 'Age']
>>> df
Sex Male    Female    
Age    0  1      0 ...
0      1  3      5   7
1      2  4      6   8

Upvotes: 1

Markus Dutschke
Markus Dutschke

Reputation: 10606

compact one-liner

>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame(np.arange(8).reshape((2,4)), columns=pd.MultiIndex.from_tuples([("m", 0), ("m", 1), ("f", 0), ("f", "...")], names=["sex", "age"]))
sex  m     f    
age  0  1  0 ...
0    0  1  2   3
1    4  5  6   7

Upvotes: 1

Def_Os
Def_Os

Reputation: 5467

Jaco's answer works nicely, but you can even create a MultiIndex from a product directly using .from_product():

sex = ['Male', 'Female']
age = range(100)
df.columns = pd.MultiIndex.from_product([sex, age], names=['Sex', 'Age'])

Upvotes: 23

Alex
Alex

Reputation: 21766

You can use the itertools module to generate your columns variable by taking the cartesian join of gender and the age range in your data, for example:

import itertools
max_age = 100
sex = ['Male','Female']
age = range(max_age)
columns=list(itertools.product(sex, age))
df.columns = pd.MultiIndex.from_tuples(columns)
df.columns.names = ['Sex', 'Age']

Upvotes: 10

Related Questions