Add a column to each parent column in multi header dataframe

Question

So I have a function that returns a dataframe with a format similar to the df bellow.

header = pd.MultiIndex.from_product([['location1','location2'],
                                     ['S1','S2','S3']],
                                    names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6), 
                  index=['a','b','c','d','e'], 
                  columns=header)
df
loc location1                     location2                    
S          S1        S2        S3        S1        S2        S3
a   -1.245988  0.858071 -1.433669  0.105300 -0.630531 -0.148113
b    1.132016  0.318813  0.949564 -0.349722 -0.904325  0.443206
c   -0.017991  0.032925  0.274248  0.326454 -0.108982  0.567472
d    2.363533 -1.676141  0.562893  0.967338 -1.071719 -0.321113
e    1.921324  0.110705  0.023244 -0.432196  0.172972 -0.50368

What I would like to do is to create a new column in each location, to create a table like so.

loc location1                               location2                              
S          S1        S2        S3        S4        S1        S2        S3        S4
a   -1.245988  0.858071 -1.433669 -1.433669  0.105300 -0.630531 -0.148113 -1.433669
b    1.132016  0.318813  0.949564  0.949564 -0.349722 -0.904325  0.443206  0.949564
c   -0.017991  0.032925  0.274248  0.274248  0.326454 -0.108982  0.567472  0.274248
d    2.363533 -1.676141  0.562893  0.562893  0.967338 -1.071719 -0.321113  0.562893
e    1.921324  0.110705  0.023244  0.023244 -0.432196  0.172972 -0.50368   0.023244

Everytime I try to create a column in a similar fashion to normal DataFrames, it says the index does not exist...

Is this even possible? Or do I need to recreate the dataframe with the columns I need, and populate the data?

Thank you for your help.

jezrael · Accepted Answer

If need new column filled by values location1, S3 use:

df = df.stack(0)
df['S4'] = df.groupby(level=0)['S3'].transform('first')
df = df.unstack().swaplevel(1,0,axis=1).sort_index(axis=1)
print (df)
loc location1                               location2                      \
S          S1        S2        S3        S4        S1        S2        S3   
a    0.048870  0.883278  0.659195  0.659195 -1.172545  0.311774 -1.436998   
b   -0.151088 -0.994126  0.814953  0.814953 -0.097351  0.219566 -0.105680   
c    0.164416 -1.241656 -0.950665 -0.950665 -0.413584 -1.626581 -0.116679   
d   -0.343600  1.652102  0.488441  0.488441 -0.972463 -0.583244  1.142386   
e    1.744620 -0.841092 -0.420305 -0.420305 -1.524430 -0.201024  0.032714   

loc            
S          S4  
a    0.659195  
b    0.814953  
c   -0.950665  
d    0.488441  
e   -0.420305

Another way is selec column by tuple, create MultiIndex and add to original:

mux = pd.MultiIndex.from_product([df.columns.levels[0], ['S4']], names=df.columns.names)
print (mux)
MultiIndex([('location1', 'S4'),
            ('location2', 'S4')],
           names=['loc', 'S'])

df1 = df[('location1','S3')].to_frame('S4').reindex(mux, axis=1, level=1)
print (df1)
loc location1 location2
S          S4        S4
a    1.308520  1.308520
b    0.451843  0.451843
c   -2.162076 -2.162076
d    2.287326  2.287326
e    0.107393  0.107393

df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
loc location1                               location2                      \
S          S1        S2        S3        S4        S1        S2        S3   
a    0.761594 -0.542844  1.308520  1.308520 -0.650264  1.139555  0.823617   
b   -0.419000  1.389820  0.451843  0.451843 -1.866177 -2.122505 -1.640534   
c    1.040347  0.840024 -2.162076 -2.162076  0.538753  1.065711  0.745565   
d   -0.953894  0.173901  2.287326  2.287326 -1.564201 -0.081494  0.444838   
e    0.906456 -0.832980  0.107393  0.107393  0.010980  0.376815 -0.367043   

loc            
S          S4  
a    1.308520  
b    0.451843  
c   -2.162076  
d    2.287326  
e    0.107393

Add a column to each parent column in multi header dataframe

Answers (1)

Related Questions