Pandas: multiindexing column headers

Question

I have time-series DataFrame df that looks like this:

time                               A                    B                    C                      D                E
2011-01-04 15:55:00             0.003452             0.005303             0.016632             0.009611             0.000726 ...
2011-01-04 15:56:00             0.004292             0.064709             0.012159             0.020398             0.000272 ...
2011-01-04 15:57:00             0.006617             0.009344             0.018512             0.029696             0.001235 ...
2011-01-04 15:58:00             0.005883             0.048564             0.020213             0.072104             0.001337 ...
2011-01-04 15:59:00             0.008602             0.047318             0.024568             0.051225             0.002775 ...

I want to add a multiindex header to categorize each columns. So for instance, I want to group column A and B into group 1 and C and D into group 2 and E in group 3. Therefore I would have:

    group                              1                    1                    2                     2                 3
    time                               A                    B                    C                     D                 E
    2011-01-04 15:55:00             0.003452             0.005303             0.016632             0.009611             0.000726 ...
    2011-01-04 15:56:00             0.004292             0.064709             0.012159             0.020398             0.000272 ...
    2011-01-04 15:57:00             0.006617             0.009344             0.018512             0.029696             0.001235 ...
    2011-01-04 15:58:00             0.005883             0.048564             0.020213             0.072104             0.001337 ...
    2011-01-04 15:59:00             0.008602             0.047318             0.024568             0.051225             0.002775 ...

How can I do so with a conditional statement (if column == 'A' | 'B' then assign to group 1, etc...)?

TomAugspurger · Accepted Answer

You can use the MultiIndex.from_tuples method:

In [8]: df
Out[8]: 
          A         B         C         D         E
0 -1.194136 -0.467446  0.535237 -1.714561  0.599319
1  0.575012 -0.737171  0.287243  0.076106 -0.413430
2  0.635213  1.028926 -2.052527  0.922869  0.543720
3  2.411044 -1.162996  1.392117  0.328242  1.253519
4  1.156001  2.223206  1.664754  0.223262  0.854058
5  0.003120  0.064644 -0.981392 -1.468138 -0.454028
6  0.529241 -0.590786  0.380155 -1.315147  1.669510
7  0.163857  0.888864  0.957516 -0.567799 -0.563448
8  1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746  1.543167  0.511869 -0.080620  0.079988

[10 rows x 5 columns]

In [9]: header = [1, 1, 2, 2, 3, 3]

In [10]: df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))

In [11]: df
Out[11]: 
          1                   2                   3
          A         B         C         D         E
0 -1.194136 -0.467446  0.535237 -1.714561  0.599319
1  0.575012 -0.737171  0.287243  0.076106 -0.413430
2  0.635213  1.028926 -2.052527  0.922869  0.543720
3  2.411044 -1.162996  1.392117  0.328242  1.253519
4  1.156001  2.223206  1.664754  0.223262  0.854058
5  0.003120  0.064644 -0.981392 -1.468138 -0.454028
6  0.529241 -0.590786  0.380155 -1.315147  1.669510
7  0.163857  0.888864  0.957516 -0.567799 -0.563448
8  1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746  1.543167  0.511869 -0.080620  0.079988

[10 rows x 5 columns]

In [9], where I create the top level of the MultiIndex can be made in a bunch of different ways. If you have more cols you can do something like np.arange(len(df.columns)).repeat(2)

Pandas: multiindexing column headers

Answers (1)

Related Questions