ARJ
ARJ

Reputation: 2080

Splitting the header into multiple headers in DataFrame

I have a DataFrame where I need to split the header into multiple rows as headers for the same Dataframe.

The dataframe looks like this,

My data Frame looks like follows,

gene    ALL_ID_1    AML_ID_1    AML_ID_2    AML_ID_3    AML_ID_4    AML_ID_5    Stroma_ID_1 Stroma_ID_2 Stroma_ID_3 Stroma_ID_4 Stroma_ID_5 Stroma_CR_Pat_4 Stroma_CR_Pat_5 Stroma_CR_Pat_6 Stroma_CR_Pat_7 Stroma_CR_Pat_8
ENSG    8   1   11  5   10  0   628 542 767 578 462 680 513 968 415 623
ENSG    0   0   1   0   0   0   0   28  1   3   0   1   4   0   0   0
ENSG    661 1418    2580    6817    14727   5968    9   3   5   9   2   9   3   3   5   1
ENSG    20  315 212 8   790 471 1283    2042    1175    2839    1110    857 1880    1526    2262    2624
ENSG    11  26  24  9   11  2   649 532 953 463 468 878 587 245 722 484

And I want the the above header to be spitted as follows,

  network   ID  ID                               REL                
node    B_ALL   AML                 Stroma                                  
hemi    1   1   2   3   4   5   1   2   3   4   5   6   7   8   9   10
ENSG    8   1   11  5   10  0   628 542 767 578 462 680 513 968 415 623
ENSG    0   0   1   0   0   0   0   28  1   3   0   1   4   0   0   0
ENSG    661 1418    2580    6817    14727   5968    9   3   5   9   2   9   3   3   5   1
ENSG    20  315 212 8   790 471 1283    2042    1175    2839    1110    857 1880    1526    2262    2624
ENSG    11  26  24  9   11  2   649 532 953 463 468 878 587 245 722 484

Any help is greatly appreciated ..

Upvotes: 0

Views: 1077

Answers (1)

CT Zhu
CT Zhu

Reputation: 54340

Probably not the best minimal example you put here, very few people has the subject knowledge to understand what is network, node and hemi in your context.

You just need to create your MultiIndex and replace your column index with the one you created:

There are 3 rules in your example:

  • 1, whenever 'Stroma' is found, the column belongs to REL, otherwise belongs to ID.
  • 2, node is the first field of the initial column names
  • 3, hemi is the last field of the initial column names

Then, just code away:

In [110]:
df.columns = pd.MultiIndex.from_tuples(zip(np.where(df.columns.str.find('Stroma')!=-1, 'REL', 'ID'),
                                           df.columns.map(lambda x: x.split('_')[0]),
                                           df.columns.map(lambda x: x.split('_')[-1])), 
                                       names=['network', 'node', 'hemi'])

print df

network   ID                                   REL                          \
node     ALL   AML                          Stroma                           
hemi       1     1     2     3      4     5      1     2     3     4     5   
gene                                                                         
ENSG       8     1    11     5     10     0    628   542   767   578   462   
ENSG       0     0     1     0      0     0      0    28     1     3     0   
ENSG     661  1418  2580  6817  14727  5968      9     3     5     9     2   
ENSG      20   315   212     8    790   471   1283  2042  1175  2839  1110   
ENSG      11    26    24     9     11     2    649   532   953   463   468   

network                               
node                                  
hemi       4     5     6     7     8  
gene                                  
ENSG     680   513   968   415   623  
ENSG       1     4     0     0     0  
ENSG       9     3     3     5     1  
ENSG     857  1880  1526  2262  2624  
ENSG     878   587   245   722   484  

Upvotes: 2

Related Questions