Reputation: 653
So I have a function that returns a dataframe with a format similar to the df bellow.
header = pd.MultiIndex.from_product([['location1','location2'],
['S1','S2','S3']],
names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
df
loc location1 location2
S S1 S2 S3 S1 S2 S3
a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113
b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206
c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472
d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113
e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
What I would like to do is to create a new column in each location, to create a table like so.
loc location1 location2
S S1 S2 S3 S4 S1 S2 S3 S4
a -1.245988 0.858071 -1.433669 -1.433669 0.105300 -0.630531 -0.148113 -1.433669
b 1.132016 0.318813 0.949564 0.949564 -0.349722 -0.904325 0.443206 0.949564
c -0.017991 0.032925 0.274248 0.274248 0.326454 -0.108982 0.567472 0.274248
d 2.363533 -1.676141 0.562893 0.562893 0.967338 -1.071719 -0.321113 0.562893
e 1.921324 0.110705 0.023244 0.023244 -0.432196 0.172972 -0.50368 0.023244
Everytime I try to create a column in a similar fashion to normal DataFrames, it says the index does not exist...
Is this even possible? Or do I need to recreate the dataframe with the columns I need, and populate the data?
Thank you for your help.
Upvotes: 0
Views: 250
Reputation: 863531
If need new column filled by values location1, S3
use:
df = df.stack(0)
df['S4'] = df.groupby(level=0)['S3'].transform('first')
df = df.unstack().swaplevel(1,0,axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.048870 0.883278 0.659195 0.659195 -1.172545 0.311774 -1.436998
b -0.151088 -0.994126 0.814953 0.814953 -0.097351 0.219566 -0.105680
c 0.164416 -1.241656 -0.950665 -0.950665 -0.413584 -1.626581 -0.116679
d -0.343600 1.652102 0.488441 0.488441 -0.972463 -0.583244 1.142386
e 1.744620 -0.841092 -0.420305 -0.420305 -1.524430 -0.201024 0.032714
loc
S S4
a 0.659195
b 0.814953
c -0.950665
d 0.488441
e -0.420305
Another way is selec column by tuple, create MultiIndex
and add to original:
mux = pd.MultiIndex.from_product([df.columns.levels[0], ['S4']], names=df.columns.names)
print (mux)
MultiIndex([('location1', 'S4'),
('location2', 'S4')],
names=['loc', 'S'])
df1 = df[('location1','S3')].to_frame('S4').reindex(mux, axis=1, level=1)
print (df1)
loc location1 location2
S S4 S4
a 1.308520 1.308520
b 0.451843 0.451843
c -2.162076 -2.162076
d 2.287326 2.287326
e 0.107393 0.107393
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
loc location1 location2 \
S S1 S2 S3 S4 S1 S2 S3
a 0.761594 -0.542844 1.308520 1.308520 -0.650264 1.139555 0.823617
b -0.419000 1.389820 0.451843 0.451843 -1.866177 -2.122505 -1.640534
c 1.040347 0.840024 -2.162076 -2.162076 0.538753 1.065711 0.745565
d -0.953894 0.173901 2.287326 2.287326 -1.564201 -0.081494 0.444838
e 0.906456 -0.832980 0.107393 0.107393 0.010980 0.376815 -0.367043
loc
S S4
a 1.308520
b 0.451843
c -2.162076
d 2.287326
e 0.107393
Upvotes: 1