Reputation: 3928
I have time-series DataFrame df
that looks like this:
time A B C D E
2011-01-04 15:55:00 0.003452 0.005303 0.016632 0.009611 0.000726 ...
2011-01-04 15:56:00 0.004292 0.064709 0.012159 0.020398 0.000272 ...
2011-01-04 15:57:00 0.006617 0.009344 0.018512 0.029696 0.001235 ...
2011-01-04 15:58:00 0.005883 0.048564 0.020213 0.072104 0.001337 ...
2011-01-04 15:59:00 0.008602 0.047318 0.024568 0.051225 0.002775 ...
I want to add a multiindex header to categorize each columns. So for instance, I want to group column A and B into group 1 and C and D into group 2 and E in group 3. Therefore I would have:
group 1 1 2 2 3
time A B C D E
2011-01-04 15:55:00 0.003452 0.005303 0.016632 0.009611 0.000726 ...
2011-01-04 15:56:00 0.004292 0.064709 0.012159 0.020398 0.000272 ...
2011-01-04 15:57:00 0.006617 0.009344 0.018512 0.029696 0.001235 ...
2011-01-04 15:58:00 0.005883 0.048564 0.020213 0.072104 0.001337 ...
2011-01-04 15:59:00 0.008602 0.047318 0.024568 0.051225 0.002775 ...
How can I do so with a conditional statement (if column == 'A' | 'B' then assign to group 1, etc...)?
Upvotes: 5
Views: 3625
Reputation: 28926
You can use the MultiIndex.from_tuples
method:
In [8]: df
Out[8]:
A B C D E
0 -1.194136 -0.467446 0.535237 -1.714561 0.599319
1 0.575012 -0.737171 0.287243 0.076106 -0.413430
2 0.635213 1.028926 -2.052527 0.922869 0.543720
3 2.411044 -1.162996 1.392117 0.328242 1.253519
4 1.156001 2.223206 1.664754 0.223262 0.854058
5 0.003120 0.064644 -0.981392 -1.468138 -0.454028
6 0.529241 -0.590786 0.380155 -1.315147 1.669510
7 0.163857 0.888864 0.957516 -0.567799 -0.563448
8 1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746 1.543167 0.511869 -0.080620 0.079988
[10 rows x 5 columns]
In [9]: header = [1, 1, 2, 2, 3, 3]
In [10]: df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))
In [11]: df
Out[11]:
1 2 3
A B C D E
0 -1.194136 -0.467446 0.535237 -1.714561 0.599319
1 0.575012 -0.737171 0.287243 0.076106 -0.413430
2 0.635213 1.028926 -2.052527 0.922869 0.543720
3 2.411044 -1.162996 1.392117 0.328242 1.253519
4 1.156001 2.223206 1.664754 0.223262 0.854058
5 0.003120 0.064644 -0.981392 -1.468138 -0.454028
6 0.529241 -0.590786 0.380155 -1.315147 1.669510
7 0.163857 0.888864 0.957516 -0.567799 -0.563448
8 1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746 1.543167 0.511869 -0.080620 0.079988
[10 rows x 5 columns]
In [9]
, where I create the top level of the MultiIndex can be made in a bunch of different ways. If you have more cols you can do something like np.arange(len(df.columns)).repeat(2)
Upvotes: 7