Reputation: 189
Suppose I have a dataframe of multi-index columns,
TSLA MSFT
Year revenues other_revenues expenses revenues other_revenues expenses
2019 851 10 110 200 13 213
2018 725 11 111 150 14 214
How can I add the inner columns to obtain
TSLA MSFT
Year revenues other_revenues expenses sum revenues other_revenues expenses sum
2019 851 10 110 971 200 13 213 426
2018 725 11 111 847 150 14 214 378
Secondly, in general what are the common functions I should note when working with multi-index columns? Is there a way of thinking when working with multi-index columns? I'm comfortable thinking in normal (single leveled index) but not used to multi-index. Thanks!
Upvotes: 3
Views: 110
Reputation: 15872
You can use stack
and unstack
:
>>> df2 = df.stack(1).unstack(0)
>>> df2.loc['sum', :] = df2.sum()
>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
columns=df.columns.levels[0], level=0
)
TSLA MSFT
expenses other_revenues revenues sum expenses other_revenues revenues sum
Year
2019 110.0 10.0 851.0 971.0 213.0 13.0 200.0 426.0
2018 111.0 11.0 725.0 847.0 214.0 14.0 150.0 378.0
To sum specific columns:
>>> df2 = df.stack(1).unstack(0)
>>> df2.loc['sum', :] = df2.loc[['revenues', 'other_revenues'], :].sum()
>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
columns=df.columns.levels[0], level=0
)
TSLA MSFT
expenses other_revenues revenues sum expenses other_revenues revenues sum
Year
2019 110.0 10.0 851.0 861.0 213.0 13.0 200.0 213.0
2018 111.0 11.0 725.0 736.0 214.0 14.0 150.0 164.0
Or use join
and sum
along axis=1, level=0
:
>>> cols = pd.MultiIndex.from_product([df.columns.levels[0], ['sum']])
>>> df.join(
df.sum(axis=1, level=0).set_axis(
cols,
axis=1
)
).reindex(columns=df.columns.levels[0], level=0)
TSLA MSFT
revenues other_revenues expenses sum revenues other_revenues expenses sum
Year
2019 851 10 110 971 200 13 213 426
2018 725 11 111 847 150 14 214 378
And for custom columns:
>>> df.join(
df.loc[:, (slice(None), ['revenues', 'other_revenues'])]
.sum(axis=1, level=0).set_axis(
cols,
axis=1
)
).reindex(columns=df.columns.levels[0], level=0)
TSLA MSFT
revenues other_revenues expenses sum revenues other_revenues expenses sum
Year
2019 851 10 110 861 200 13 213 213
2018 725 11 111 736 150 14 214 164
Upvotes: 1
Reputation: 862731
First create new DataFrame filled by sum
s and MultiIndex
to df1
:
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
Then use concat
for join together:
df = pd.concat([df, df1], axis=1)
And lasr for custom order is added reindex
:
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
Year
2019 200 13 213 426 851 10 110
2018 150 14 214 378 725 11 111
sum
Year
2019 971
2018 847
EDIT: You can seelct with slicers (but I think here is necessary sorted MultiIndex
):
idx = pd.IndexSlice
print (df.loc[:, idx[:, ['revenues','other_revenues']]])
TSLA MSFT TSLA MSFT
revenues revenues other_revenues other_revenues
2019 851 200 10 13
2018 725 150 11 14
# df.index.name = 'Year'
sub = ['revenues', 'other_revenues', 'expenses']
df1 = df.loc[:, idx[:, ['revenues','other_revenues']]].sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
df = pd.concat([df, df1], axis=1)
mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
MSFT TSLA \
revenues other_revenues expenses sum revenues other_revenues expenses
2019 200 13 213 213 851 10 110
2018 150 14 214 164 725 11 111
sum
2019 861
2018 736
Upvotes: 2